mlx.core.fast.affine_quantize#
- affine_quantize(w: array, /, scales: array, biases: array, group_size: int = 64, bits: int = 4, *, stream: None | Stream | Device = None) array #
Quantize the matrix
w
using the providedscales
andbiases
and thegroup_size
andbits
configuration.Formally, given the notation in
quantize()
, we compute \(w_i\) from \(\hat{w_i}\) and corresponding \(s\) and \(\beta\) as follows\[w_i = s (\hat{w_i} + \beta)\]- Parameters:
w (array) – Matrix to be quantize
scales (array) – The scales to use per
group_size
elements ofw
biases (array) – The biases to use per
group_size
elements ofw
group_size (int, optional) – The size of the group in
w
that shares a scale and bias. (default:64
)bits (int, optional) – The number of bits occupied by each element in
w
. (default:4
)
- Returns:
The quantized version of
w
- Return type: