mlx.data.core.Tokenizer.tokenize_rand#
- Tokenizer.tokenize_rand(self: mlx.data._c.core.Tokenizer, input: str) List[int] #
Tokenize the input with a valid tokenization chosen randomly from the set of valid tokenizations.
For instance if our set of tokens is {‘a’, ‘aa’, ‘b’} then the string ‘aab’ can have 2 different tokenizations:
0, 0, 2
1, 2
Tokenizer.tokenize_shortest()
will return the second one if notrie_key_scores
are provided whileTokenizer.tokenize_rand()
will sample either of the two.- Parameters:
input (str) – The input string to be tokenized.