AmsGrad Optimizer. In some cases, adaptive learning rate methods such as AdaMax fail to converge to the optimal solution because of the exponential moving average over past gradients. To address this problem, Sashank J. Reddi, Satyen Kale and Sanjiv Kumar proposed the AmsGrad update algorithm. The update rule for (equivalent to in AdaDelta and in RMSProp) is modified such that is guaranteed, giving the algorithm a “long-term memory” of past gradients. The vectors and are initialized to zero, and are updated with the parameters :
Constructs a new
|learning_rate||float=0.001||The learning rate $\eta$.|
|beta1||float=0.9||First exponential decay rate.|
|beta2||float=0.999||Second exponential decay rate.|
|epscut||float=1e-07||Small epsilon cutoff.|
Simple AmsGrad optimizer.
>>> from netket.optimizer import AmsGrad >>> op = AmsGrad()
Member function resetting the internal state of the optimizer.