#### Variational

AmsGrad Optimizer. In some cases, adaptive learning rate methods such as AdaMax fail to converge to the optimal solution because of the exponential moving average over past gradients. To address this problem, Sashank J. Reddi, Satyen Kale and Sanjiv Kumar proposed the AmsGrad update algorithm. The update rule for $\mathbf{v}$ (equivalent to $E[g^2]$ in AdaDelta and $\mathbf{s}$ in RMSProp) is modified such that $v^\prime_k \geq v_k$ is guaranteed, giving the algorithm a “long-term memory” of past gradients. The vectors $\mathbf{m}$ and $\mathbf{v}$ are initialized to zero, and are updated with the parameters $\mathbf{p}$:

## Class Constructor

Constructs a new AmsGrad optimizer.

Argument Type Description
learning_rate float=0.001 The learning rate $\eta$.
beta1 float=0.9 First exponential decay rate.
beta2 float=0.999 Second exponential decay rate.
epscut float=1e-07 Small epsilon cutoff.

### Examples

>>> from netket.optimizer import AmsGrad