mlx.core.quantized_matmul#
- quantized_matmul(x: array, w: array, /, scales: array, biases: array, transpose: bool = True, group_size: int = 64, bits: int = 4, *, stream: None | Stream | Device = None) array #
Perform the matrix multiplication with the quantized matrix
w
. The quantization uses one floating point scale and bias pergroup_size
of elements. Each element inw
takesbits
bits and is packed in an unsigned 32 bit integer.- Parameters:
x (array) – Input array
w (array) – Quantized matrix packed in unsigned integers
scales (array) – The scales to use per
group_size
elements ofw
biases (array) – The biases to use per
group_size
elements ofw
transpose (bool, optional) – Defines whether to multiply with the transposed
w
or not, namely whether we are performingx @ w.T
orx @ w
. Default:True
.group_size (int, optional) – The size of the group in
w
that shares a scale and bias. Default:64
.bits (int, optional) – The number of bits occupied by each element in
w
. Default:4
.
- Returns:
The result of the multiplication of
x
withw
.- Return type: