LLM Fundamentals

How LLM works

LLM Pipeline

This is what happens when a user sends a message and gets a response. Every concept you learned plugs in here.

Flow Diagram
Walking through one specific example
How Models Are Trained
Why training matters for FDE

Example Models (the landscape you should be able to name)

TRAINING (one-time, by Google/OpenAI/Anthropic)
                  ════════════════════════════════════════════════
                  Pretrain → SFT → RLHF/DPO → (optional fine-tune)
                  ════════════════════════════════════════════════
                                      │
                                      ▼
                              DEPLOYED MODEL
                                      │
                                      ▼
                  INFERENCE (every request, by you)
                  ═════════════════════════════════
                  Tokenize → Embed → Position → Transformer blocks
                       → Final layer → Logits → Decode → Detokenize
                  ═════════════════════════════════
                                      │
                                      ▼
                      WRAPPED IN A SYSTEM (your job)
                      ══════════════════════════════
                      RAG (retrieval) + prompting + agents + 
                      eval + observability + security + scaling
                      ══════════════════════════════

Three time horizons:

Training: weeks-months, billions of dollars, done by labs
Inference: milliseconds-seconds per request, you pay per call
System: the architecture around the model, where FDE work lives

Your job as an FDE is mostly in the third layer. But you need to understand the first two to make good choices in the third.