mlx.data.core.Tokenizer.__init__

mlx.data.core.Tokenizer.__init__#

Tokenizer.__init__(self: mlx.data._c.core.Tokenizer, trie: mlx.data._c.core.CharTrie, ignore_unk: bool = False, trie_key_scores: List[float] = []) None#

Make a tokenizer object that can be used to tokenize arbitrary strings.

Parameters:
  • trie (mlx.data.core.CharTrie) – The trie containing the possible tokens.

  • ignore_unk (bool) – Whether unknown tokens should be ignored or an error should be raised. (default: false)

  • trie_key_scores (list[float]) – A list containing one score per trie node. If left empty each score is assumed equal to 1. Tokenize shortest minimizes the sum of these scores over the sequence of tokens.