Common Optimizers#

SGD(learning_rate[, momentum, weight_decay, ...])

The stochastic gradient descent optimizer.

RMSprop(learning_rate[, alpha, eps])

The RMSprop optimizer [1].

Adagrad(learning_rate[, eps])

The Adagrad optimizer [1].

Adafactor([learning_rate, eps, ...])

The Adafactor optimizer.

AdaDelta(learning_rate[, rho, eps])

The AdaDelta optimizer with a learning rate [1].

Adam(learning_rate[, betas, eps])

The Adam optimizer [1].

AdamW(learning_rate[, betas, eps, weight_decay])

The AdamW optimizer [1].

Adamax(learning_rate[, betas, eps])

The Adamax optimizer, a variant of Adam based on the infinity norm [1].

Lion(learning_rate[, betas, weight_decay])

The Lion optimizer [1].