These are Models that exist to take an input token, and generate a distribution of output tokens that can be repeatedly sampled to continue generation.
Structure
- Input text is Tokenized. Often through BPE.
- The Encoder will distill the input into its essential features. Often through One Hot Encoding
- The encoded text is Embedded through a Look Up Table Matrix
- The encoded text is sent into the Residual Stream
- Each layer in the stream is stored as a Transformer Block (Which includes an Attention Layer, )
- The token is picked and the model continues through Causal Attention. This processing will occur in paralell with other sequences
- The Decoder will expand the outputs into generative data