mlx.data.tokenizer_helpers.read_trie_from_spm#
- class mlx.data.tokenizer_helpers.read_trie_from_spm(spm_file)#
Read an
mlx.data.core.CharTrie
from a sentencepiece file.Reading directly from a model file requires installing sentencepiece, however if the vocabulary and the scores are exported the file can be read without installing sentencepiece.
- Parameters:
spm_file (str) – Either a sentencepiece model file or a vocab file extracted from a sentencepiece model.
- Returns:
The trie and the corresponding weights from the SPM mdoel.
- Return type:
tuple[
mlx.data.core.CharTrie
, list[float]]