mlx.core.gather_qmm#
- gather_qmm(x: array, w: array, /, scales: array, biases: array | None = None, lhs_indices: array | None = None, rhs_indices: array | None = None, transpose: bool = True, group_size: int | None = None, bits: int | None = None, mode: str = 'affine', *, sorted_indices: bool = False, stream: None | Stream | Device = None) array#
Perform quantized matrix multiplication with matrix-level gather.
This operation is the quantized equivalent to
gather_mm(). Similar togather_mm(), the indiceslhs_indicesandrhs_indicescontain flat indices along the batch dimensions (i.e. all but the last two dimensions) ofxandwrespectively.Note that
scalesandbiasesmust have the same batch dimensions aswsince they represent the same quantized matrix.- Parameters:
x (array) – Input array
w (array) – Quantized matrix packed in unsigned integers
scales (array) – The scales to use per
group_sizeelements ofwbiases (array, optional) – The biases to use per
group_sizeelements ofw. Default:None.lhs_indices (array, optional) – Integer indices for
x. Default:None.rhs_indices (array, optional) – Integer indices for
w. Default:None.transpose (bool, optional) – Defines whether to multiply with the transposed
wor not, namely whether we are performingx @ w.Torx @ w. Default:True.group_size (int, optional) – The size of the group in
wthat shares a scale and bias. See supported values and defaults in the table of quantization modes. Default:None.bits (int, optional) – The number of bits occupied by each element of
win the quantized array. See supported values and defaults in the table of quantization modes. Default:None.mode (str, optional) – The quantization mode. Default:
"affine".sorted_indices (bool, optional) – May allow a faster implementation if the passed indices are sorted. Default:
False.
- Returns:
- The result of the multiplication of
xwithw after gathering using
lhs_indicesandrhs_indices.
- The result of the multiplication of
- Return type: