mlx.core.dequantize#
- dequantize(w: array, /, scales: array, biases: Optional[array] = = None, group_size: int = 64, bits: int = 4, mode: str = 'affine', *, stream: Union[None, Stream, Device] = None) array #
Dequantize the matrix
w
using quantization parameters.- Parameters:
w (array) – Matrix to be dequantized
scales (array) – The scales to use per
group_size
elements ofw
.biases (array, optional) – The biases to use per
group_size
elements ofw
. Default:None
.group_size (int, optional) – The size of the group in
w
that shares a scale and bias. Default:64
.bits (int, optional) – The number of bits occupied by each element in
w
. Default:4
.mode (str, optional) – The quantization mode. Default:
"affine"
.
- Returns:
The dequantized version of
w
- Return type:
Notes
The currently supported quantization modes are
"affine"
andmxfp4
.For
affine
quantization, given the notation inquantize()
, we compute \(w_i\) from \(\hat{w_i}\) and corresponding \(s\) and \(\beta\) as follows\[w_i = s \hat{w_i} + \beta\]