At every decode step, the model has just done a forward pass through all 96 layers. The final output isn't a token yet. It's a vector of logits — one number per vocabulary token.
Step 1: softmax converts logits to probabilities.
Step 2: pick one token from this distribution. ← This is where decoding strategy enters.
Five strategies for picking, in order of complexity.
structured output features: constrained decoding actually works (worth knowing): at each decode step, the library looks at your JSON schema, figures out which tokens are legal at this position (e.g., after { you need a string key, not a number), and masks out all illegal tokens before sampling. The model literally cannot produce invalid JSON because invalid tokens are zeroed out of the distribution.