Momentum SGD optimization
momentum(stepsize = 0.05, beta1 = 0.9, beta2 = 1 - beta1)
stepsize for SGD
beta1 for momentum
beta2 for momentum
a list of control variables for optimization
(used in control_opt
function)
The update rule for momentum is: $$v_t = \beta_1 v_{t-1} + \beta_2 g_t$$ $$x_{t+1} = x_t - \text{stepsize} * v_t$$