mlx.data.core.Tokenizer.__init__#
- Tokenizer.__init__(self: mlx.data._c.core.Tokenizer, trie: mlx.data._c.core.CharTrie, ignore_unk: bool = False, trie_key_scores: List[float] = []) None #
Make a tokenizer object that can be used to tokenize arbitrary strings.
- Parameters:
trie (mlx.data.core.CharTrie) – The trie containing the possible tokens.
ignore_unk (bool) – Whether unknown tokens should be ignored or an error should be raised. (default: false)
trie_key_scores (list[float]) – A list containing one score per trie node. If left empty each score is assumed equal to 1. Tokenize shortest minimizes the sum of these scores over the sequence of tokens.