Jun 15, 2026
Optimizers: From SGD to AdamW
Lab note Part of the ShivasNotes transformer-from-scratch series. Previously: dL/d(LLM): The Full Backward Pass. The full backward-pass post ended with every weight in the model holding a gr...
Read post →Tagged With
1 post connected to this tag.
Get my rants delivered to your inbox