Jun 24, 2024 Multimodal Tokenization with Vector Quantization: A Review May 10, 2024 Memory-Efficient Attention: MHA vs. MQA vs. GQA vs. MLA