Adam SGD optimization
adam(stepsize = 0.05, beta1 = 0.9, beta2 = 0.999, epsilon = 1e-08)
stepsize for SGD
beta1 for Adam
beta2 for Adam
epsilon for numerical stability
a list of control variables for optimization
(used in control_opt
function)
The update rule for Adam is: $$m_t = \beta_1 m_{t-1} + (1 - \beta_1) g_t$$ $$v_t = \beta_2 v_{t-1} + (1 - \beta_2) g_t^2$$ $$\hat{m_t} = m_t / (1 - \beta_1^t)$$ $$\hat{v_t} = v_t / (1 - \beta_2^t)$$ $$x_{t+1} = x_t - \text{stepsize} * \frac{\hat{m_t}}{\sqrt{\hat{v_t}} + \epsilon}$$