Decoding | Notion

At every decode step, the model has just done a forward pass through all 96 layers. The final output isn't a token yet. It's a vector of logits — one number per vocabulary token.

Step 1: softmax converts logits to probabilities.

Step 2: pick one token from this distribution. ← This is where decoding strategy enters.

Five strategies for picking, in order of complexity.

Greedy theoretically deterministic
Sampling (proportional)
Top-k
4: Top-p (nucleus sampling)
Temperature

structured output features: constrained decoding actually works (worth knowing): at each decode step, the library looks at your JSON schema, figures out which tokens are legal at this position (e.g., after { you need a string key, not a number), and masks out all illegal tokens before sampling. The model literally cannot produce invalid JSON because invalid tokens are zeroed out of the distribution.