This is a layer for Transformer.
Comprised of several heads that correspond to a current token within a sequence.
Every head send’s their value in Parallel to the Feature Decoder.
This is a layer for Transformer.
Comprised of several heads that correspond to a current token within a sequence.
Every head send’s their value in Parallel to the Feature Decoder.