May 10, 2024 Memory-Efficient Attention: MHA vs. MQA vs. GQA vs. MLA Jan 26, 2023 Positional Encoding in Transformers: From Sinusoidal to RoPE