This is a Tokenizer. Originally designed to be a Compression Algorithm, but later turned to an encoder. Encodes plaintext into tokens.

Algorithm

  1. Have a target size , and a starting size
  2. Treats character streams as a token
  3. The most frequent pairs of two tokens are merged together to a single token
  4. Continue until you have tokens of size