This is what happens when a user sends a message and gets a response. Every concept you learned plugs in here.
Example Models (the landscape you should be able to name)
TRAINING (one-time, by Google/OpenAI/Anthropic)
════════════════════════════════════════════════
Pretrain → SFT → RLHF/DPO → (optional fine-tune)
════════════════════════════════════════════════
│
▼
DEPLOYED MODEL
│
▼
INFERENCE (every request, by you)
═════════════════════════════════
Tokenize → Embed → Position → Transformer blocks
→ Final layer → Logits → Decode → Detokenize
═════════════════════════════════
│
▼
WRAPPED IN A SYSTEM (your job)
══════════════════════════════
RAG (retrieval) + prompting + agents +
eval + observability + security + scaling
══════════════════════════════
Three time horizons:
Your job as an FDE is mostly in the third layer. But you need to understand the first two to make good choices in the third.