mlx.core.quantized_matmul#
- quantized_matmul(x: array, w: array, /, scales: array, biases: array | None = None, transpose: bool = True, group_size: int = 64, bits: int = 4, mode: str = 'affine', *, stream: None | Stream | Device = None) array#
Perform the matrix multiplication with the quantized matrix
w. The quantization uses one floating point scale and bias pergroup_sizeof elements. Each element inwtakesbitsbits and is packed in an unsigned 32 bit integer.- Parameters:
x (array) – Input array
w (array) – Quantized matrix packed in unsigned integers
scales (array) – The scales to use per
group_sizeelements ofwbiases (array, optional) – The biases to use per
group_sizeelements ofw. Default:None.transpose (bool, optional) – Defines whether to multiply with the transposed
wor not, namely whether we are performingx @ w.Torx @ w. Default:True.group_size (int, optional) – The size of the group in
wthat shares a scale and bias. Default:64.bits (int, optional) – The number of bits occupied by each element in
w. Default:4.mode (str, optional) – The quantization mode. Default:
"affine".
- Returns:
The result of the multiplication of
xwithw.- Return type: