Every System Has 4 Layers to Reason Through
LOAD → [Ingress / Load Balancer]
↓
[Application Layer] ← stateless, horizontally scalable
↓
[Data Layer] ← stateful, the hard part
↓
[External Deps] ← auth, LLM APIs, payment, etc.
Scalability patterns:
| Problem | Solution | Trade-off |
|---|---|---|
| Too many requests | Horizontal scaling (more instances) | Stateless requirement |
| Slow DB reads | Read replicas | Eventual consistency on reads |
| Hot DB rows | Caching (Redis, Memcached) | Cache invalidation complexity |
| Large datasets | Sharding / partitioning | Cross-shard queries become expensive |
| Bursty traffic | Queue-based async processing | Adds latency, complexity |
| Global users | CDN + multi-region | Data residency, sync complexity |
Caching tiers (from closest to user to furthest):
Async vs Sync:
Consistency trade-offs (CAP theorem — one sentence version):
For FDE, system design questions will often have an AI component. The patterns that map onto standard distributed systems:
| Standard Pattern | GenAI Equivalent |
|---|---|
| Caching hot DB queries | Semantic caching (cache similar prompts, not just exact matches) |
| Model routing | Flash for triage → Pro for hard queries (cost + latency optimization) |
| Async queue | Batch embedding jobs, offline document ingestion |
| Read replicas | Serving a pre-built vector index vs re-indexing live |
| Rate limiting | Per-tenant token budget, per-user request throttling |
| CDN | Prompt template caching, pre-computed FAQ answers |