mlx.core.dequantize

Contents

mlx.core.dequantize#

dequantize(w: array, /, scales: array, biases: Optional[array] = = None, group_size: int = 64, bits: int = 4, mode: str = 'affine', *, stream: Union[None, Stream, Device] = None) array#

Dequantize the matrix w using quantization parameters.

Parameters:
  • w (array) – Matrix to be dequantized

  • scales (array) – The scales to use per group_size elements of w.

  • biases (array, optional) – The biases to use per group_size elements of w. Default: None.

  • group_size (int, optional) – The size of the group in w that shares a scale and bias. Default: 64.

  • bits (int, optional) – The number of bits occupied by each element in w. Default: 4.

  • mode (str, optional) – The quantization mode. Default: "affine".

Returns:

The dequantized version of w

Return type:

array

Notes

The currently supported quantization modes are "affine" and mxfp4.

For affine quantization, given the notation in quantize(), we compute \(w_i\) from \(\hat{w_i}\) and corresponding \(s\) and \(\beta\) as follows

\[w_i = s \hat{w_i} + \beta\]