Archive | The Gradient

Fork me on GitHub

Nice! 72 posts in total. Keep on posting.

2024

06-24

Multimodal Tokenization with Vector Quantization: A Review

05-10

Memory-Efficient Attention: MHA vs. MQA vs. GQA vs. MLA

2023

01-26

Inductive Positions in Transformers

2022

12-12

Diffusion Models: A Mathematical Note from Scratch

05-13

Large Language Models for Programming Languages

04-17

Efficient Large-Scale Distributed Training

01-10

Mask Denoising Strategy for Pre-trained Language Models

2021

11-29

Subword Tokenization in Natural Language Processing

10-09

Scaling Up Large Language Models: A Summary

02-07

Review: Backpropagation step by step