The Gradient
Language is not just words.
Home
Blog
Tags
Categories
Search
Nice! 72 posts in total. Keep on posting.
2024
06-24
Multimodal Tokenization with Vector Quantization: A Review
05-10
Memory-Efficient Attention: MHA vs. MQA vs. GQA vs. MLA
2023
01-26
Inductive Positions in Transformers
2022
12-12
Diffusion Models: A Mathematical Note from Scratch
05-13
Large Language Models for Programming Languages
04-17
Efficient Large-Scale Distributed Training
01-10
Mask Denoising Strategy for Pre-trained Language Models
2021
11-29
Subword Tokenization in Natural Language Processing
10-09
Scaling Up Large Language Models: A Summary
02-07
Review: Backpropagation step by step
1
2
…
8