mlx.nn.QuantizedLinear

Contents

mlx.nn.QuantizedLinear#

class QuantizedLinear(input_dims: int, output_dims: int, bias: bool = True, group_size: int = 64, bits: int = 4)#

Applies an affine transformation to the input using a quantized weight matrix.

It is the quantized equivalent of mlx.nn.Linear. For now its parameters are frozen and will not be included in any gradient computation but this will probably change in the future.

QuantizedLinear also provides a classmethod from_linear() to convert linear layers to QuantizedLinear layers.

Parameters:
  • input_dims (int) – The dimensionality of the input features.

  • output_dims (int) – The dimensionality of the output features.

  • bias (bool, optional) – If set to False then the layer will not use a bias. Default: True.

  • group_size (int, optional) – The group size to use for the quantized weight. See quantize(). Default: 64.

  • bits (int, optional) – The bit width to use for the quantized weight. See quantize(). Default: 4.

Methods

from_linear(linear_layer[, group_size, bits])

Create a QuantizedLinear layer from a Linear layer.

unfreeze(*args, **kwargs)

Wrap unfreeze so that we unfreeze any layers we might contain but our parameters will remain frozen.