These are Models that exist to take an input token, and generate a distribution of output tokens that can be repeatedly sampled to continue generation.

Structure

  1. Input text is Tokenized. Often through BPE.
  2. The Encoder will distill the input into its essential features. Often through One Hot Encoding
  3. The encoded text is Embedded through a Look Up Table Matrix
  4. The encoded text is sent into the Residual Stream
    1. Each layer in the stream is stored as a Transformer Block (Which includes an Attention Layer, )
  5. The token is picked and the model continues through Causal Attention. This processing will occur in paralell with other sequences
  6. The Decoder will expand the outputs into generative data

Concepts