A stack of a single Attention Layer and a single MLP Layer. Attention Layer handles context of the current stream MLP Layer provides to logic to get the next Logit