AdaGrad SGD optimization
adagrad(stepsize = 0.05, epsilon = 1e-08)
stepsize for SGD
epsilon for numerical stability
a list of control variables for optimization
(used in control_opt
function)
The update rule for AdaGrad is: $$v_t = v_{t-1} + g_t^2$$ $$x_{t+1} = x_t - \text{stepsize} * \frac{g_t}{\sqrt{v_t} + \epsilon}$$