AmsGrad Optimizer. In some cases, adaptive learning rate methods such as AdaMax fail to converge to the optimal solution because of the exponential moving average over past gradients. To address this problem, Sashank J. Reddi, Satyen Kale and Sanjiv Kumar proposed the AmsGrad update algorithm. The update rule for (equivalent to in AdaDelta and in RMSProp) is modified such that is guaranteed, giving the algorithm a “long-term memory” of past gradients. The vectors and are initialized to zero, and are updated with the parameters :

Class Constructor

Constructs a new AmsGrad optimizer.

Argument Type Description
learning_rate float=0.001 The learning rate $\eta$.
beta1 float=0.9 First exponential decay rate.
beta2 float=0.999 Second exponential decay rate.
epscut float=1e-07 Small epsilon cutoff.


Simple AmsGrad optimizer.

>>> from netket.optimizer import AmsGrad
>>> op = AmsGrad()

Class Methods


Member function resetting the internal state of the optimizer.