mlx.nn.LayerNorm#

class LayerNorm(dims: int, eps: float = 1e-05, affine: bool = True, bias: bool = True)#

Applies layer normalization [1] on the inputs.

Computes

\[y = \frac{x - E[x]}{\sqrt{Var[x]} + \epsilon} \gamma + \beta,\]

where \(\gamma\) and \(\beta\) are learned per feature dimension parameters initialized at 1 and 0 respectively.

[1]: https://arxiv.org/abs/1607.06450

Parameters:
  • dims (int) – The feature dimension of the input to normalize over

  • eps (float) – A small additive constant for numerical stability

  • affine (bool) – If True learn an affine transform to apply after the normalization

  • bias (bool) – If True include a translation to the affine transformation. If set to False the transformation is not really affine just scaling.

Methods