← All posts

Tagged With

QKV

1 post connected to this tag.

Attention: The Core Of The Transformer

Jun 10, 2026

Attention: The Core Of The Transformer

Attention is the core transformer mechanism: Q/K/V projections, head splitting, RoPE, scaled dot-products, masking, softmax, weighted value sums, GQA, Flash Attention, and the full backward ...

Read post →

Subscribe

Get my rants delivered to your inbox

I will send new posts as and when I write. No fixed cadence, just engineering notes, rants, and things I am thinking through.