mlx.nn.LayerNorm#
- class LayerNorm(dims: int, eps: float = 1e-05, affine: bool = True, bias: bool = True)#
Applies layer normalization [1] on the inputs.
Computes
\[y = \frac{x - E[x]}{\sqrt{Var[x]} + \epsilon} \gamma + \beta,\]where \(\gamma\) and \(\beta\) are learned per feature dimension parameters initialized at 1 and 0 respectively.
[1]: https://arxiv.org/abs/1607.06450
- Parameters:
dims (int) – The feature dimension of the input to normalize over
eps (float) – A small additive constant for numerical stability
affine (bool) – If True learn an affine transform to apply after the normalization
bias (bool) – If True include a translation to the affine transformation. If set to False the transformation is not really affine just scaling.
Methods