Jun 24, 2024 Multimodal Tokenization with Vector Quantization: A Review Apr 17, 2022 Efficient Distributed Training: From DP to ZeRO and FlashAttention Jan 10, 2022 Masking Strategies for Pre-trained Language Models: From MLM to T5 Nov 29, 2021 Subword Tokenization in NLP: BPE, WordPiece, and Unigram Dec 14, 2019 BERTology: From XLNet to ELECTRA