mlx.optimizers.Adamax

Contents

mlx.optimizers.Adamax#

class Adamax(learning_rate: float | Callable[[array], array], betas: List[float] = [0.9, 0.999], eps: float = 1e-08)#

The Adamax optimizer, a variant of Adam based on the infinity norm [1].

Our Adam implementation follows the original paper and omits the bias correction in the first and second moment estimates. In detail,

[1]: Kingma, D.P. and Ba, J., 2015. Adam: A method for stochastic optimization. ICLR 2015.

mt+1=β1mt+(1β1)gtvt+1=max(β2vt,|gt|)wt+1=wtλmt+1vt+1+ϵ
Parameters:
  • learning_rate (float or callable) – The learning rate λ.

  • betas (Tuple[float, float], optional) – The coefficients (β1,β2) used for computing running averages of the gradient and its square. Default: (0.9, 0.999)

  • eps (float, optional) – The term ϵ added to the denominator to improve numerical stability. Default: 1e-8

Methods

__init__(learning_rate[, betas, eps])

apply_single(gradient, parameter, state)

Performs the Adamax parameter update and stores v and m in the optimizer state.

init_single(parameter, state)

Initialize optimizer state