mlx.nn.QuantizedLinear#
- class QuantizedLinear(input_dims: int, output_dims: int, bias: bool = True, group_size: int = 64, bits: int = 4)#
Applies an affine transformation to the input using a quantized weight matrix.
It is the quantized equivalent of
mlx.nn.Linear
. For now its parameters are frozen and will not be included in any gradient computation but this will probably change in the future.QuantizedLinear
also provides a classmethodfrom_linear()
to convert linear layers toQuantizedLinear
layers.- Parameters:
input_dims (int) – The dimensionality of the input features.
output_dims (int) – The dimensionality of the output features.
bias (bool, optional) – If set to
False
then the layer will not use a bias. Default:True
.group_size (int, optional) – The group size to use for the quantized weight. See
quantize()
. Default:64
.bits (int, optional) – The bit width to use for the quantized weight. See
quantize()
. Default:4
.
Methods
from_linear
(linear_layer[, group_size, bits])Create a
QuantizedLinear
layer from aLinear
layer.unfreeze
(*args, **kwargs)Wrap unfreeze so that we unfreeze any layers we might contain but our parameters will remain frozen.