May 9, 2026•LLM Architecture
Why GQA Changes the KV Cache Bill Before Quantization
How grouped-query attention changes the KV cache formula before FP8 or INT4: MHA vs MQA vs GQA, decode bandwidth, and serving capacity math.
Read articleTransformer internals, context behavior, and model design trade-offs.
How grouped-query attention changes the KV cache formula before FP8 or INT4: MHA vs MQA vs GQA, decode bandwidth, and serving capacity math.
Read articleA walkthrough of scaled dot-product attention (Q/K/V), softmax temperature, and why increasing head count shifts attention statistics and behavior.
Read article