Tokenization

Context Window

Transformers

Decoding

Embedding Model

Hallucination

LLM Pipeline

This is what happens when a user sends a message and gets a response. Every concept you learned plugs in here.

Example Models (the landscape you should be able to name)

TRAINING (one-time, by Google/OpenAI/Anthropic)
                  ════════════════════════════════════════════════
                  Pretrain → SFT → RLHF/DPO → (optional fine-tune)
                  ════════════════════════════════════════════════
                                      │
                                      ▼
                              DEPLOYED MODEL
                                      │
                                      ▼
                  INFERENCE (every request, by you)
                  ═════════════════════════════════
                  Tokenize → Embed → Position → Transformer blocks
                       → Final layer → Logits → Decode → Detokenize
                  ═════════════════════════════════
                                      │
                                      ▼
                      WRAPPED IN A SYSTEM (your job)
                      ══════════════════════════════
                      RAG (retrieval) + prompting + agents + 
                      eval + observability + security + scaling
                      ══════════════════════════════

Three time horizons:

Your job as an FDE is mostly in the third layer. But you need to understand the first two to make good choices in the third.