This is a Tokenizer. Originally designed to be a Compression Algorithm, but later turned to an encoder. Encodes plaintext into tokens.
Algorithm
- Have a target size , and a starting size
- Treats character streams as a token
- The most frequent pairs of two tokens are merged together to a single token
- Continue until you have tokens of size