mlx.optimizers.Adamax#
- class Adamax(learning_rate: float | Callable[[array], array], betas: List[float] = [0.9, 0.999], eps: float = 1e-08)#
The Adamax optimizer, a variant of Adam based on the infinity norm [1].
Our Adam implementation follows the original paper and omits the bias correction in the first and second moment estimates. In detail,
[1]: Kingma, D.P. and Ba, J., 2015. Adam: A method for stochastic optimization. ICLR 2015.
- Parameters:
learning_rate (float or callable) – The learning rate
.betas (Tuple[float, float], optional) – The coefficients
used for computing running averages of the gradient and its square. Default:(0.9, 0.999)
eps (float, optional) – The term
added to the denominator to improve numerical stability. Default:1e-8
Methods
__init__
(learning_rate[, betas, eps])apply_single
(gradient, parameter, state)Performs the Adamax parameter update and stores
and in the optimizer state.init_single
(parameter, state)Initialize optimizer state