Usage
adam(stepsize = 0.05, beta1 = 0.9, beta2 = 0.999, epsilon = 1e-08)
Arguments
- stepsize
stepsize for SGD
- beta1
beta1 for Adam
- beta2
beta2 for Adam
- epsilon
epsilon for numerical stability
Value
a list of control variables for optimization
(used in control_opt function)
Details
The update rule for Adam is:
$$m_t = \beta_1 m_{t-1} + (1 - \beta_1) g_t$$
$$v_t = \beta_2 v_{t-1} + (1 - \beta_2) g_t^2$$
$$\hat{m_t} = m_t / (1 - \beta_1^t)$$
$$\hat{v_t} = v_t / (1 - \beta_2^t)$$
$$x_{t+1} = x_t - \text{stepsize} * \frac{\hat{m_t}}{\sqrt{\hat{v_t}} + \epsilon}$$