System Design | Notion

Every System Has 4 Layers to Reason Through

LOAD →  [Ingress / Load Balancer]
           ↓
        [Application Layer]  ← stateless, horizontally scalable
           ↓
        [Data Layer]         ← stateful, the hard part
           ↓
        [External Deps]      ← auth, LLM APIs, payment, etc.

The Core Vocabulary

Scalability patterns:

Problem	Solution	Trade-off
Too many requests	Horizontal scaling (more instances)	Stateless requirement
Slow DB reads	Read replicas	Eventual consistency on reads
Hot DB rows	Caching (Redis, Memcached)	Cache invalidation complexity
Large datasets	Sharding / partitioning	Cross-shard queries become expensive
Bursty traffic	Queue-based async processing	Adds latency, complexity
Global users	CDN + multi-region	Data residency, sync complexity

Caching tiers (from closest to user to furthest):

CDN — static assets, edge caching
App-level — in-memory (Redis), session data, hot query results
DB-level — query cache, materialized views

Async vs Sync:

Sync: user waits for response. Simple, tight latency SLA required.
Async (queue/pub-sub): decouple producer from consumer. Good for: ingestion pipelines, long-running tasks, fan-out. Cost: harder to reason about failures.

Consistency trade-offs (CAP theorem — one sentence version):

You can't have perfect consistency + availability + partition tolerance simultaneously
In practice: choose between strong consistency (user always sees latest data, higher latency) vs eventual consistency (faster, but reads may be stale briefly)

The GenAI-Specific Layer

For FDE, system design questions will often have an AI component. The patterns that map onto standard distributed systems:

Standard Pattern	GenAI Equivalent
Caching hot DB queries	Semantic caching (cache similar prompts, not just exact matches)
Model routing	Flash for triage → Pro for hard queries (cost + latency optimization)
Async queue	Batch embedding jobs, offline document ingestion
Read replicas	Serving a pre-built vector index vs re-indexing live
Rate limiting	Per-tenant token budget, per-user request throttling
CDN	Prompt template caching, pre-computed FAQ answers