This is a layer for Transformer. Comprised of several heads that correspond to a current token within a sequence. Every head send’s their value in Parallel to the Feature Decoder.