AdaMax Optimizer. AdaMax is an adaptive stochastic gradient descent method, and a variant of Adam based on the infinity norm. In contrast to the SGD, AdaMax offers the important advantage of being much less sensitive to the choice of the hyper-parameters (for example, the learning rate).
Given a stochastic estimate of the gradient of the cost function (), AdaMax performs an update:
where implicitly depends on all the history of the optimization up to the current point. The NetKet naming convention of the parameters strictly follows the one introduced by the authors of AdaMax. For an in-depth description of this method, please refer to Kingma, D., & Ba, J. (2015). Adam: a method for stochastic optimization (Algorithm 2 therein).
Constructs a new
|alpha||float=0.001||The step size.|
|beta1||float=0.9||First exponential decay rate.|
|beta2||float=0.999||Second exponential decay rate.|
|epscut||float=1e-07||Small epsilon cutoff.|
Simple AdaMax optimizer.
>>> from netket.optimizer import AdaMax >>> op = AdaMax()
Member function resetting the internal state of the optimizer.