mlx.core.gather_qmm#
- gather_qmm(x: array, w: array, /, scales: array, biases: array, lhs_indices: array | None = None, rhs_indices: array | None = None, transpose: bool = True, group_size: int = 64, bits: int = 4, *, stream: None | Stream | Device = None) array #
Perform quantized matrix multiplication with matrix-level gather.
This operation is the quantized equivalent to
gather_mm()
. Similar togather_mm()
, the indiceslhs_indices
andrhs_indices
contain flat indices along the batch dimensions (i.e. all but the last two dimensions) ofx
andw
respectively.Note that
scales
andbiases
must have the same batch dimensions asw
since they represent the same quantized matrix.- Parameters:
x (array) – Input array
w (array) – Quantized matrix packed in unsigned integers
scales (array) – The scales to use per
group_size
elements ofw
biases (array) – The biases to use per
group_size
elements ofw
lhs_indices (array, optional) – Integer indices for
x
. Default:None
.rhs_indices (array, optional) – Integer indices for
w
. Default:None
.transpose (bool, optional) – Defines whether to multiply with the transposed
w
or not, namely whether we are performingx @ w.T
orx @ w
. Default:True
.group_size (int, optional) – The size of the group in
w
that shares a scale and bias. Default:64
.bits (int, optional) – The number of bits occupied by each element in
w
. Default:4
.
- Returns:
- The result of the multiplication of
x
withw
after gathering using
lhs_indices
andrhs_indices
.
- The result of the multiplication of
- Return type: