machine-learning
an archive of posts in this category
| Dec 12, 2022 | Diffusion Models: A Mathematical Guide from Scratch |
|---|---|
| Apr 17, 2022 | Efficient Distributed Training: From DP to ZeRO and FlashAttention |
| Jan 10, 2022 | Masking Strategies for Pre-trained Language Models: From MLM to T5 |
| Dec 14, 2019 | BERTology: From XLNet to ELECTRA |
| Feb 28, 2019 | Normalization in Neural Networks: BN, LN, RMSNorm, and Beyond |
| Jan 22, 2019 | Attention Mechanisms and the Transformer Architecture |