Offline eval tells you the system can be good. Observability tells you whether it is good right now. This is where the JD's "tokens/sec, cost-per-request" lives.
- 3.1 The Three Pillars (extended for LLMs)
- 3.2 Tracing — Why It's the Most Important Tool
- 3.3 The LLM-Specific Metrics (this is what the JD verbatim calls out)
- 3.4 The Stack
- 3.5 Sampling Strategy
- 3.6 PII and Logging — The Security Tie-In