mlx.nn.QuantizedLinear#
- class QuantizedLinear(input_dims: int, output_dims: int, bias: bool = True, group_size: int = 64, bits: int = 4, mode: str = 'affine')#
Applies an affine transformation to the input using a quantized weight matrix.
It is the quantized equivalent of
mlx.nn.Linear
. For now its parameters are frozen and will not be included in any gradient computation but this will probably change in the future.QuantizedLinear
also provides a classmethodfrom_linear()
to convert linear layers toQuantizedLinear
layers.- Parameters:
input_dims (int) – The dimensionality of the input features.
output_dims (int) – The dimensionality of the output features.
bias (bool, optional) – If set to
False
then the layer will not use a bias. Default:True
.group_size (int, optional) – The group size to use for the quantized weight. See
quantize()
. Default:64
.bits (int, optional) – The bit width to use for the quantized weight. See
quantize()
. Default:4
.mode (str) – The quantization method to use (see
mlx.core.quantize()
). Default:"affine"
.
Methods
from_linear
(linear_layer[, group_size, ...])Create a
QuantizedLinear
layer from aLinear
layer.