System Design Cheatsheet
Scalability, load balancing, caching, databases, microservices & distributed systems fundamentals
Core CS + Interview
A structured framework for tackling any system design question in 45-60 minutes.
๐ก 4-Step Framework
- Step 1 โ Requirements (5 min): Functional (what does it do?) & Non-functional (scale, latency, availability)
- Step 2 โ Estimation (5 min): Users, QPS, storage, bandwidth
- Step 3 โ High-Level Design (15 min): Draw main components (client, LB, servers, DB, cache)
- Step 4 โ Deep Dive (20 min): Discuss trade-offs, bottlenecks, database schema, APIs
Questions to Ask
- How many users? (DAU/MAU)
- Read-heavy or write-heavy?
- What's the expected latency? (p99 < 200ms?)
- Availability target? (99.9% = 8.7 hours downtime/year)
- Consistency requirement? (strong vs eventual)
- Geographic distribution?
Vertical Scaling (Scale Up)
- Add more CPU/RAM/disk to one machine
- Simple โ no code changes
- Limited by hardware ceiling
- Single point of failure
- Good for: databases, early-stage apps
Horizontal Scaling (Scale Out)
- Add more machines to the pool
- Complex โ needs load balancing, state management
- Virtually unlimited scaling
- Better fault tolerance
- Good for: web servers, microservices
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ High-Level Architecture โ
โ โ
โ Clients โ DNS โ CDN โ Load Balancer โ App Servers (N) โ
โ โ โ โ
โ Cache (Redis) Message Queue โ
โ โ โ โ
โ Database Workers (N) โ
โ (Primary + Replicas) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Distributes incoming traffic across multiple servers. Sits between clients and backend servers.
Types
| Type | Layer | How it works |
| L4 (Transport) | TCP/UDP | Routes based on IP + port. Fast, no content inspection. (e.g., AWS NLB) |
| L7 (Application) | HTTP | Routes based on URL, headers, cookies. Can do SSL termination. (e.g., Nginx, ALB) |
Algorithms
| Algorithm | Description | Best For |
| Round Robin | Requests distributed sequentially | Equal-capacity servers |
| Weighted Round Robin | Proportional to server weight | Different server specs |
| Least Connections | Send to server with fewest active connections | Long-lived connections |
| IP Hash | Hash client IP to pick server | Session affinity (sticky sessions) |
| Consistent Hashing | Minimize re-mapping when servers added/removed | Distributed caches, DB sharding |
๐ Key Points
- Use health checks to remove unhealthy servers
- Use active-passive LB pairs for high availability
- SSL termination at LB reduces server load
Store frequently accessed data in fast storage (memory) to reduce latency and database load.
Cache Strategies
| Strategy | Read | Write | Use Case |
| Cache-Aside | App checks cache โ miss โ read DB โ populate cache | App writes to DB, invalidates cache | General purpose, most common |
| Read-Through | Cache checks DB on miss automatically | Same as cache-aside | Simpler app code |
| Write-Through | Same as cache-aside | Write to cache + DB synchronously | Strong consistency needed |
| Write-Behind | Same as cache-aside | Write to cache, async write to DB | Write-heavy, eventual consistency OK |
Eviction Policies
- LRU (Least Recently Used) โ Most common. Evict oldest-accessed item. Works well for most workloads.
- LFU (Least Frequently Used) โ Evict least-accessed. Better for skewed access patterns.
- TTL (Time To Live) โ Expire after fixed time. Simple, prevents stale data.
- FIFO โ First in, first out. Simple but not access-aware.
โ ๏ธ Cache Problems
- Cache stampede: Many requests hit DB when cache expires โ Use lock/semaphore or stale-while-revalidate
- Cache inconsistency: DB updated but cache stale โ Use short TTL + invalidation
- Hot key: One key gets massive traffic โ Replicate across nodes or use local cache
Where to Cache
- Client-side: Browser cache (HTTP headers: Cache-Control, ETag)
- CDN: Static content at edge locations (images, JS, CSS)
- Application-level: Redis/Memcached for API responses, session data
- Database: Query cache, materialized views
Network of edge servers geographically distributed to serve content closer to users.
Push CDN
- You upload content to CDN
- Content available before first request
- You control what's cached
- Good for: rarely changing content
Pull CDN
- CDN fetches from origin on first request
- Lazy population โ less storage needed
- First request is slow (cache miss)
- Good for: dynamic, frequently changing content
Common CDNs: CloudFront (AWS), Cloudflare, Fastly, Akamai
SQL (Relational)
- Structured schema, tables, rows
- ACID transactions
- JOIN support
- Strong consistency
- Vertical scaling (primarily)
- Use for: Financial data, user accounts, relational data
- Examples: PostgreSQL, MySQL
NoSQL
- Flexible/schemaless
- BASE (Basically Available, Soft state, Eventually consistent)
- No complex JOINs
- Eventual consistency (usually)
- Horizontal scaling (built-in)
- Use for: Real-time analytics, social feeds, IoT data
- Examples: MongoDB, Cassandra, DynamoDB
NoSQL Types
| Type | Model | Examples | Best For |
| Key-Value | Simple keyโvalue pairs | Redis, DynamoDB | Caching, sessions, config |
| Document | JSON-like documents | MongoDB, CouchDB | Flexible schemas, CMS, catalogs |
| Column-Family | Columns grouped in families | Cassandra, HBase | Time-series, analytics, heavy writes |
| Graph | Nodes + edges | Neo4j, Neptune | Social networks, recommendation engines |
In a distributed system, you can only guarantee two out of three:
| Property | Meaning |
| C โ Consistency | Every read gets the most recent write (all nodes see same data) |
| A โ Availability | Every request gets a response (even if it's not the latest data) |
| P โ Partition Tolerance | System continues operating despite network partitions between nodes |
Since network partitions are inevitable, you really choose between:
- CP (Consistency + Partition tolerance): System may reject requests during partition. Examples: MongoDB, HBase, ZooKeeper
- AP (Availability + Partition tolerance): System always responds but may return stale data. Examples: Cassandra, DynamoDB, CouchDB
๐ In practice: Most systems aren't purely CP or AP โ they make trade-offs per operation. For example, a bank transfer needs CP, but a social media feed can use AP.
| Pattern | Description | Trade-off |
| Strong Consistency | Read always returns latest write. All replicas synchronized. | Higher latency, lower throughput |
| Eventual Consistency | Replicas converge over time. Reads may be stale temporarily. | Lower latency, higher availability |
| Read-Your-Writes | User always sees their own writes immediately. | Balanced โ good UX without full consistency |
| Causal Consistency | Operations that are causally related are seen in same order by all. | Weaker than strong but preserves logical order |
Split data across multiple databases/servers to handle more data than a single machine can.
Strategies
| Strategy | How | Pros | Cons |
| Hash-based | shard = hash(key) % N | Even distribution | Hard to add/remove shards (re-hashing) |
| Range-based | Users A-M โ Shard 1, N-Z โ Shard 2 | Simple, range queries work | Hot spots if distribution is uneven |
| Consistent Hashing | Hash ring, minimal remapping | Easy to add/remove nodes | More complex implementation |
| Geo-based | Data by geographic region | Low latency for local users | Cross-region queries are expensive |
โ ๏ธ Sharding Challenges
- Cross-shard joins are very expensive โ denormalize data
- Rebalancing when adding shards is complex
- Hot shards if data isn't evenly distributed (celebrity problem)
- Consider: do you really need sharding? Try read replicas + caching first.
Copy data across multiple servers for availability, fault tolerance, and read scalability.
| Type | How | Consistency | Use Case |
| Single-Leader | One primary handles writes, replicas handle reads | Strong (sync) or eventual (async) | Most common: PostgreSQL, MySQL |
| Multi-Leader | Multiple nodes accept writes | Conflict resolution needed | Multi-datacenter, offline-first apps |
| Leaderless | Any node can accept reads/writes (quorum-based) | Configurable (R + W > N) | Cassandra, DynamoDB |
Quorum reads/writes: With N replicas, write to W nodes, read from R nodes. If R + W > N, you're guaranteed to read latest write.
Decouple producers and consumers. Enable async processing, load leveling, and fault tolerance.
Producer โ [ Message Queue ] โ Consumer(s)
(Kafka, RabbitMQ, SQS)
Benefits:
โข Decoupling: producer doesn't wait for consumer
โข Buffering: handle traffic spikes gracefully
โข Retry: failed messages can be re-processed
โข Ordering: Kafka guarantees order within a partition
Queue vs Pub/Sub
| Feature | Message Queue | Pub/Sub |
| Model | Point-to-point | Broadcast to all subscribers |
| Consumer | One consumer per message | All subscribers get every message |
| Use case | Task processing, work distribution | Event notifications, real-time updates |
| Examples | SQS, RabbitMQ | Kafka topics, SNS, Redis Pub/Sub |
Monolith
- Single deployable unit
- Shared database
- Simple deployment & debugging
- All-or-nothing scaling
- Good for: small teams, MVPs, simple apps
Microservices
- Multiple independent services
- Own database per service
- Independent deployment & scaling
- Complex: needs service discovery, distributed tracing
- Good for: large teams, complex domains, scale needs
Communication Patterns
| Pattern | Type | Description |
| REST/HTTP | Synchronous | Simple, widely understood. Tight coupling. |
| gRPC | Synchronous | Binary protocol, auto-generated clients. Faster than REST. |
| Message Queue | Asynchronous | Decoupled, resilient. More complex debugging. |
| Event Sourcing | Asynchronous | Store events as source of truth. Full audit trail. |
Key Patterns
- API Gateway: Single entry point. Handles routing, auth, rate limiting. (Kong, AWS API Gateway)
- Service Discovery: Services find each other dynamically. (Consul, Kubernetes DNS)
- Circuit Breaker: Stop calling a failing service. Prevent cascading failures. (Resilience4j)
- Saga Pattern: Manage distributed transactions. Each service runs a local transaction + compensating action on failure.
- CQRS: Separate read and write models. Optimize each independently.
| Style | Data Format | Best For |
| REST | JSON over HTTP | CRUD APIs, web apps, public APIs |
| GraphQL | Query language | Flexible queries, mobile apps (reduce over-fetching) |
| gRPC | Protocol Buffers | Internal microservice communication, streaming |
| WebSocket | Full-duplex | Real-time: chat, live updates, gaming |
REST Best Practices
# Use nouns, not verbs
GET /api/v1/users # list users
GET /api/v1/users/123 # get user 123
POST /api/v1/users # create user
PUT /api/v1/users/123 # update user 123 (full replace)
PATCH /api/v1/users/123 # partial update
DELETE /api/v1/users/123 # delete user 123
# Pagination
GET /api/v1/users?page=2&limit=20
GET /api/v1/users?cursor=abc123&limit=20 # cursor-based (better for large datasets)
# Filtering & Sorting
GET /api/v1/users?status=active&sort=-created_at
# Status codes
200 OK # Success
201 Created # Resource created
204 No Content # Success, no body (DELETE)
400 Bad Request # Invalid input
401 Unauthorized # Not authenticated
403 Forbidden # Not authorized
404 Not Found # Resource doesn't exist
429 Too Many Requests # Rate limited
500 Internal Error # Server bug
Protect services from abuse and ensure fair usage.
| Algorithm | How It Works | Pros / Cons |
| Token Bucket | Bucket fills with tokens at fixed rate. Each request consumes a token. | Allows bursts. Most common (AWS, Stripe). |
| Leaky Bucket | Requests queue and are processed at a fixed rate. | Smooth output. No bursts allowed. |
| Fixed Window | Count requests per time window (e.g., 100/minute). | Simple. Boundary spikes possible. |
| Sliding Window Log | Track timestamps of all requests. Count within window. | Accurate but memory-heavy. |
| Sliding Window Counter | Combine fixed windows with weighted count. | Good balance of accuracy & efficiency. |
Latency Numbers
| Operation | Latency |
| L1 cache reference | ~1 ns |
| L2 cache reference | ~4 ns |
| RAM reference | ~100 ns |
| SSD random read | ~16 ฮผs |
| HDD seek | ~2 ms |
| Network round trip (same datacenter) | ~0.5 ms |
| Network round trip (cross-continent) | ~150 ms |
Storage Units
| Unit | Size | Example |
| 1 KB | 10ยณ bytes | A short text paragraph |
| 1 MB | 10โถ bytes | A high-res photo |
| 1 GB | 10โน bytes | A movie (720p) |
| 1 TB | 10ยนยฒ bytes | ~500 hours of HD video |
| 1 PB | 10ยนโต bytes | ~500 billion pages of text |
Quick Estimations
# Daily Active Users (DAU) to QPS
DAU = 10M users
Requests per user per day = 10
Total daily requests = 100M
QPS = 100M / 86400 โ 1,157 QPS
Peak QPS โ 2ร average โ 2,300 QPS
# Storage estimation
10M users ร 1 KB per user = 10 GB (user data)
1M posts/day ร 5 KB ร 365 days = 1.8 TB/year
# Bandwidth
QPS ร avg response size = bandwidth
1000 QPS ร 10 KB = 10 MB/s = 80 Mbps
URL Shortener (bit.ly)
- Write: Generate short ID (base62 encoding or hash), store mapping in DB
- Read: Lookup short URL โ redirect to original. Cache popular URLs in Redis
- Key decisions: ID generation (counter vs hash), expiration policy, analytics tracking
Chat System (WhatsApp)
- Transport: WebSocket for real-time bidirectional messaging
- Message flow: Sender โ Chat Server โ Message Queue โ Recipient's Chat Server โ Recipient
- Storage: Cassandra for messages (write-heavy, time-series). Redis for online status.
- Key decisions: Message ordering (timestamp + sequence), read receipts, group chat fanout
News Feed (Twitter/Instagram)
- Fanout-on-write: When user posts, push to all followers' feeds (fast reads, slow writes)
- Fanout-on-read: Build feed at read time by fetching from followed users (slow reads, fast writes)
- Hybrid: Fanout-on-write for most users, fanout-on-read for celebrities (>1M followers)
- Cache: Pre-compute and cache feeds in Redis sorted sets
Rate Limiter
- Where: API Gateway or dedicated middleware
- Algorithm: Token bucket (most common) or sliding window
- Storage: Redis (fast counter with TTL) โ
INCR + EXPIRE
- Distributed: Centralized Redis counter or race-condition-free Lua scripts
Notification System
- Types: Push (APNs/FCM), SMS (Twilio), Email (SES), In-app
- Architecture: Event โ Notification Service โ Priority Queue โ Provider-specific workers
- Key decisions: De-duplication, user preferences, rate limiting per user, retry with exponential backoff
๐ก System Design Tips
- Always start with requirements โ don't jump to solutions
- No single "correct" design โ trade-offs matter
- Think about: What happens when X fails?
- Read-heavy? โ Cache aggressively, read replicas
- Write-heavy? โ Message queues, async processing
- Global users? โ CDN, multi-region, eventual consistency