mlx.data.tokenizer_helpers.read_trie_from_spm

mlx.data.tokenizer_helpers.read_trie_from_spm#

class mlx.data.tokenizer_helpers.read_trie_from_spm(spm_file)#

Read an mlx.data.core.CharTrie from a sentencepiece file.

Reading directly from a model file requires installing sentencepiece, however if the vocabulary and the scores are exported the file can be read without installing sentencepiece.

Parameters:

spm_file (str) – Either a sentencepiece model file or a vocab file extracted from a sentencepiece model.

Returns:

The trie and the corresponding weights from the SPM mdoel.

Return type:

tuple[mlx.data.core.CharTrie, list[float]]