mlx.data.core.CharTrie

mlx.data.core.CharTrie#

class mlx.data.core.CharTrie#

A Trie implementation for characters.

It enables making a graph of all possible tokenizations and then searching for the shortest one.

Methods

__init__(self)

insert(self, token)

Insert a token in the trie making a new token if it doesn't already exist.

key(self, id)

Get the id-th token as a list of characters.

key_bytes(self, id)

Get the id-th token as bytes.

key_string(self, id)

Get the string that corresponds to the id-th token.

num_keys(self)

Return how many keys/nodes have been inserted in the Trie.

root(self)

Get the root node of the trie

search(self, token)

Search a the passed string or list of characters in the trie and return the node or None if not found.