netket.optimizer.AdaGrad¶
-
class
netket.optimizer.
AdaGrad
¶ AdaGrad Optimizer. In many cases, in Sgd the learning rate :math`eta` should decay as a function of training iteration to prevent overshooting as the optimum is approached. AdaGrad is an adaptive learning rate algorithm that automatically scales the learning rate with a sum over past gradients. The vector \(\mathbf{g}\) is initialized to zero. Given a stochastic estimate of the gradient of the cost function \(G(\mathbf{p})\), the updates for \(g_k\) and the parameter \(p_k\) are
\[\begin{split}g^\prime_k &= g_k + G_k(\mathbf{p})^2\\ p^\prime_k &= p_k - \frac{\eta}{\sqrt{g_k + \epsilon}}G_k(\mathbf{p})\end{split}\]AdaGrad has been shown to perform particularly well when the gradients are sparse, but the learning rate may become too small after many updates because the sum over the squares of past gradients is cumulative.
-
__init__
(self: netket._C_netket.optimizer.AdaGrad, learning_rate: float = 0.001, epscut: float = 1e-07) → None¶ Constructs a new
AdaGrad
optimizer.- Parameters
learning_rate – Learning rate \(\eta\).
epscut – Small \(\epsilon\) cutoff.
Examples
Simple AdaDelta optimizer.
>>> from netket.optimizer import AdaGrad >>> op = AdaGrad()
Methods
__init__
(self, learning_rate, epscut)Constructs a new
AdaGrad
optimizer.init
(self, arg0, arg1)reset
(self)Member function resetting the internal state of the optimizer.
update
(*args, **kwargs)Overloaded function.
-
init
(self: netket._C_netket.optimizer.Optimizer, arg0: int, arg1: bool) → None¶
-
reset
(self: netket._C_netket.optimizer.Optimizer) → None¶ Member function resetting the internal state of the optimizer.
-
update
(*args, **kwargs)¶ Overloaded function.
update(self: netket._C_netket.optimizer.Optimizer, grad: numpy.ndarray[float64[m, 1]], param: numpy.ndarray[float64[m, 1], flags.writeable]) -> None
Update param by applying a gradient-based optimization step using grad.
update(self: netket._C_netket.optimizer.Optimizer, grad: numpy.ndarray[complex128[m, 1]], param: numpy.ndarray[complex128[m, 1], flags.writeable]) -> None
Update param by applying a gradient-based optimization step using grad.
-