LLM Architecture | ScaleMindLabs

LLM Architecture

Transformer internals, context behavior, and model design trade-offs.

May 9, 2026•LLM Architecture

How grouped-query attention changes the KV cache formula before FP8 or INT4: MHA vs MQA vs GQA, decode bandwidth, and serving capacity math.

Feb 26, 2026•LLM Architecture

A walkthrough of scaled dot-product attention (Q/K/V), softmax temperature, and why increasing head count shifts attention statistics and behavior.