System Design Cheatsheet

Scalability, load balancing, caching, databases, microservices & distributed systems fundamentals

Core CS + Interview

Contents

Design Approach Scaling Load Balancers Caching CDN Database Design CAP Theorem Consistency Patterns Partitioning & Sharding Replication Message Queues Microservices API Design Rate Limiting Back-of-Envelope Math Common System Designs

🎯

System Design Approach

A structured framework for tackling any system design question in 45-60 minutes.

💡 4-Step Framework

Step 1 — Requirements (5 min): Functional (what does it do?) & Non-functional (scale, latency, availability)
Step 2 — Estimation (5 min): Users, QPS, storage, bandwidth
Step 3 — High-Level Design (15 min): Draw main components (client, LB, servers, DB, cache)
Step 4 — Deep Dive (20 min): Discuss trade-offs, bottlenecks, database schema, APIs

Questions to Ask

How many users? (DAU/MAU)
Read-heavy or write-heavy?
What's the expected latency? (p99 < 200ms?)
Availability target? (99.9% = 8.7 hours downtime/year)
Consistency requirement? (strong vs eventual)
Geographic distribution?

📈

Scaling

Vertical Scaling (Scale Up)

Add more CPU/RAM/disk to one machine
Simple — no code changes
Limited by hardware ceiling
Single point of failure
Good for: databases, early-stage apps

Horizontal Scaling (Scale Out)

Add more machines to the pool
Complex — needs load balancing, state management
Virtually unlimited scaling
Better fault tolerance
Good for: web servers, microservices

┌──────────────────────────────────────────────────────────────┐ │ High-Level Architecture │ │ │ │ Clients → DNS → CDN → Load Balancer → App Servers (N) │ │ ↓ ↓ │ │ Cache (Redis) Message Queue │ │ ↓ ↓ │ │ Database Workers (N) │ │ (Primary + Replicas) │ └──────────────────────────────────────────────────────────────┘

⚖️

Load Balancers

Distributes incoming traffic across multiple servers. Sits between clients and backend servers.

Types

Type	Layer	How it works
L4 (Transport)	TCP/UDP	Routes based on IP + port. Fast, no content inspection. (e.g., AWS NLB)
L7 (Application)	HTTP	Routes based on URL, headers, cookies. Can do SSL termination. (e.g., Nginx, ALB)

Algorithms

Algorithm	Description	Best For
Round Robin	Requests distributed sequentially	Equal-capacity servers
Weighted Round Robin	Proportional to server weight	Different server specs
Least Connections	Send to server with fewest active connections	Long-lived connections
IP Hash	Hash client IP to pick server	Session affinity (sticky sessions)
Consistent Hashing	Minimize re-mapping when servers added/removed	Distributed caches, DB sharding

🔑 Key Points

Use health checks to remove unhealthy servers
Use active-passive LB pairs for high availability
SSL termination at LB reduces server load

⚡

Caching

Store frequently accessed data in fast storage (memory) to reduce latency and database load.

Cache Strategies

Strategy	Read	Write	Use Case
Cache-Aside	App checks cache → miss → read DB → populate cache	App writes to DB, invalidates cache	General purpose, most common
Read-Through	Cache checks DB on miss automatically	Same as cache-aside	Simpler app code
Write-Through	Same as cache-aside	Write to cache + DB synchronously	Strong consistency needed
Write-Behind	Same as cache-aside	Write to cache, async write to DB	Write-heavy, eventual consistency OK

Eviction Policies

LRU (Least Recently Used) — Most common. Evict oldest-accessed item. Works well for most workloads.
LFU (Least Frequently Used) — Evict least-accessed. Better for skewed access patterns.
TTL (Time To Live) — Expire after fixed time. Simple, prevents stale data.
FIFO — First in, first out. Simple but not access-aware.

⚠️ Cache Problems

Cache stampede: Many requests hit DB when cache expires → Use lock/semaphore or stale-while-revalidate
Cache inconsistency: DB updated but cache stale → Use short TTL + invalidation
Hot key: One key gets massive traffic → Replicate across nodes or use local cache

Where to Cache

Client-side: Browser cache (HTTP headers: Cache-Control, ETag)
CDN: Static content at edge locations (images, JS, CSS)
Application-level: Redis/Memcached for API responses, session data
Database: Query cache, materialized views

🌍

CDN (Content Delivery Network)

Network of edge servers geographically distributed to serve content closer to users.

Push CDN

You upload content to CDN
Content available before first request
You control what's cached
Good for: rarely changing content

Pull CDN

CDN fetches from origin on first request
Lazy population — less storage needed
First request is slow (cache miss)
Good for: dynamic, frequently changing content

Common CDNs: CloudFront (AWS), Cloudflare, Fastly, Akamai

🗄️

Database Design

SQL (Relational)

Structured schema, tables, rows
ACID transactions
JOIN support
Strong consistency
Vertical scaling (primarily)
Use for: Financial data, user accounts, relational data
Examples: PostgreSQL, MySQL

NoSQL

Flexible/schemaless
BASE (Basically Available, Soft state, Eventually consistent)
No complex JOINs
Eventual consistency (usually)
Horizontal scaling (built-in)
Use for: Real-time analytics, social feeds, IoT data
Examples: MongoDB, Cassandra, DynamoDB

NoSQL Types

Type	Model	Examples	Best For
Key-Value	Simple key→value pairs	Redis, DynamoDB	Caching, sessions, config
Document	JSON-like documents	MongoDB, CouchDB	Flexible schemas, CMS, catalogs
Column-Family	Columns grouped in families	Cassandra, HBase	Time-series, analytics, heavy writes
Graph	Nodes + edges	Neo4j, Neptune	Social networks, recommendation engines

🔺

CAP Theorem

In a distributed system, you can only guarantee two out of three:

Property	Meaning
C — Consistency	Every read gets the most recent write (all nodes see same data)
A — Availability	Every request gets a response (even if it's not the latest data)
P — Partition Tolerance	System continues operating despite network partitions between nodes

Since network partitions are inevitable, you really choose between:

CP (Consistency + Partition tolerance): System may reject requests during partition. Examples: MongoDB, HBase, ZooKeeper
AP (Availability + Partition tolerance): System always responds but may return stale data. Examples: Cassandra, DynamoDB, CouchDB

🔑 In practice: Most systems aren't purely CP or AP — they make trade-offs per operation. For example, a bank transfer needs CP, but a social media feed can use AP.

🔄

Consistency Patterns

Pattern	Description	Trade-off
Strong Consistency	Read always returns latest write. All replicas synchronized.	Higher latency, lower throughput
Eventual Consistency	Replicas converge over time. Reads may be stale temporarily.	Lower latency, higher availability
Read-Your-Writes	User always sees their own writes immediately.	Balanced — good UX without full consistency
Causal Consistency	Operations that are causally related are seen in same order by all.	Weaker than strong but preserves logical order

🔪

Partitioning & Sharding

Split data across multiple databases/servers to handle more data than a single machine can.

Strategies

Strategy	How	Pros	Cons
Hash-based	`shard = hash(key) % N`	Even distribution	Hard to add/remove shards (re-hashing)
Range-based	Users A-M → Shard 1, N-Z → Shard 2	Simple, range queries work	Hot spots if distribution is uneven
Consistent Hashing	Hash ring, minimal remapping	Easy to add/remove nodes	More complex implementation
Geo-based	Data by geographic region	Low latency for local users	Cross-region queries are expensive

⚠️ Sharding Challenges

Cross-shard joins are very expensive → denormalize data
Rebalancing when adding shards is complex
Hot shards if data isn't evenly distributed (celebrity problem)
Consider: do you really need sharding? Try read replicas + caching first.

📋

Replication

Copy data across multiple servers for availability, fault tolerance, and read scalability.

Type	How	Consistency	Use Case
Single-Leader	One primary handles writes, replicas handle reads	Strong (sync) or eventual (async)	Most common: PostgreSQL, MySQL
Multi-Leader	Multiple nodes accept writes	Conflict resolution needed	Multi-datacenter, offline-first apps
Leaderless	Any node can accept reads/writes (quorum-based)	Configurable (R + W > N)	Cassandra, DynamoDB

Quorum reads/writes: With N replicas, write to W nodes, read from R nodes. If R + W > N, you're guaranteed to read latest write.

📨

Message Queues

Decouple producers and consumers. Enable async processing, load leveling, and fault tolerance.

Producer → [ Message Queue ] → Consumer(s) (Kafka, RabbitMQ, SQS) Benefits: • Decoupling: producer doesn't wait for consumer • Buffering: handle traffic spikes gracefully • Retry: failed messages can be re-processed • Ordering: Kafka guarantees order within a partition

Queue vs Pub/Sub

Feature	Message Queue	Pub/Sub
Model	Point-to-point	Broadcast to all subscribers
Consumer	One consumer per message	All subscribers get every message
Use case	Task processing, work distribution	Event notifications, real-time updates
Examples	SQS, RabbitMQ	Kafka topics, SNS, Redis Pub/Sub

🧩

Microservices

Monolith

Single deployable unit
Shared database
Simple deployment & debugging
All-or-nothing scaling
Good for: small teams, MVPs, simple apps

Microservices

Multiple independent services
Own database per service
Independent deployment & scaling
Complex: needs service discovery, distributed tracing
Good for: large teams, complex domains, scale needs

Communication Patterns

Pattern	Type	Description
REST/HTTP	Synchronous	Simple, widely understood. Tight coupling.
gRPC	Synchronous	Binary protocol, auto-generated clients. Faster than REST.
Message Queue	Asynchronous	Decoupled, resilient. More complex debugging.
Event Sourcing	Asynchronous	Store events as source of truth. Full audit trail.

Key Patterns

API Gateway: Single entry point. Handles routing, auth, rate limiting. (Kong, AWS API Gateway)
Service Discovery: Services find each other dynamically. (Consul, Kubernetes DNS)
Circuit Breaker: Stop calling a failing service. Prevent cascading failures. (Resilience4j)
Saga Pattern: Manage distributed transactions. Each service runs a local transaction + compensating action on failure.
CQRS: Separate read and write models. Optimize each independently.

🔌

API Design

Style	Data Format	Best For
REST	JSON over HTTP	CRUD APIs, web apps, public APIs
GraphQL	Query language	Flexible queries, mobile apps (reduce over-fetching)
gRPC	Protocol Buffers	Internal microservice communication, streaming
WebSocket	Full-duplex	Real-time: chat, live updates, gaming

REST Best Practices

# Use nouns, not verbs
GET    /api/v1/users          # list users
GET    /api/v1/users/123      # get user 123
POST   /api/v1/users          # create user
PUT    /api/v1/users/123      # update user 123 (full replace)
PATCH  /api/v1/users/123      # partial update
DELETE /api/v1/users/123      # delete user 123

# Pagination
GET /api/v1/users?page=2&limit=20
GET /api/v1/users?cursor=abc123&limit=20   # cursor-based (better for large datasets)

# Filtering & Sorting
GET /api/v1/users?status=active&sort=-created_at

# Status codes
200 OK              # Success
201 Created         # Resource created
204 No Content      # Success, no body (DELETE)
400 Bad Request     # Invalid input
401 Unauthorized    # Not authenticated
403 Forbidden       # Not authorized
404 Not Found       # Resource doesn't exist
429 Too Many Requests  # Rate limited
500 Internal Error  # Server bug

🚦

Rate Limiting

Protect services from abuse and ensure fair usage.

Algorithm	How It Works	Pros / Cons
Token Bucket	Bucket fills with tokens at fixed rate. Each request consumes a token.	Allows bursts. Most common (AWS, Stripe).
Leaky Bucket	Requests queue and are processed at a fixed rate.	Smooth output. No bursts allowed.
Fixed Window	Count requests per time window (e.g., 100/minute).	Simple. Boundary spikes possible.
Sliding Window Log	Track timestamps of all requests. Count within window.	Accurate but memory-heavy.
Sliding Window Counter	Combine fixed windows with weighted count.	Good balance of accuracy & efficiency.

🧮

Back-of-Envelope Math

Latency Numbers

Operation	Latency
L1 cache reference	~1 ns
L2 cache reference	~4 ns
RAM reference	~100 ns
SSD random read	~16 μs
HDD seek	~2 ms
Network round trip (same datacenter)	~0.5 ms
Network round trip (cross-continent)	~150 ms

Storage Units

Unit	Size	Example
1 KB	10³ bytes	A short text paragraph
1 MB	10⁶ bytes	A high-res photo
1 GB	10⁹ bytes	A movie (720p)
1 TB	10¹² bytes	~500 hours of HD video
1 PB	10¹⁵ bytes	~500 billion pages of text

Quick Estimations

# Daily Active Users (DAU) to QPS
DAU = 10M users
Requests per user per day = 10
Total daily requests = 100M
QPS = 100M / 86400 ≈ 1,157 QPS
Peak QPS ≈ 2× average ≈ 2,300 QPS

# Storage estimation
10M users × 1 KB per user = 10 GB (user data)
1M posts/day × 5 KB × 365 days = 1.8 TB/year

# Bandwidth
QPS × avg response size = bandwidth
1000 QPS × 10 KB = 10 MB/s = 80 Mbps

🏗️

Common System Designs

URL Shortener (bit.ly)

Write: Generate short ID (base62 encoding or hash), store mapping in DB
Read: Lookup short URL → redirect to original. Cache popular URLs in Redis
Key decisions: ID generation (counter vs hash), expiration policy, analytics tracking

Chat System (WhatsApp)

Transport: WebSocket for real-time bidirectional messaging
Message flow: Sender → Chat Server → Message Queue → Recipient's Chat Server → Recipient
Storage: Cassandra for messages (write-heavy, time-series). Redis for online status.
Key decisions: Message ordering (timestamp + sequence), read receipts, group chat fanout

News Feed (Twitter/Instagram)

Fanout-on-write: When user posts, push to all followers' feeds (fast reads, slow writes)
Fanout-on-read: Build feed at read time by fetching from followed users (slow reads, fast writes)
Hybrid: Fanout-on-write for most users, fanout-on-read for celebrities (>1M followers)
Cache: Pre-compute and cache feeds in Redis sorted sets

Rate Limiter

Where: API Gateway or dedicated middleware
Algorithm: Token bucket (most common) or sliding window
Storage: Redis (fast counter with TTL) — INCR + EXPIRE
Distributed: Centralized Redis counter or race-condition-free Lua scripts

Notification System

Types: Push (APNs/FCM), SMS (Twilio), Email (SES), In-app
Architecture: Event → Notification Service → Priority Queue → Provider-specific workers
Key decisions: De-duplication, user preferences, rate limiting per user, retry with exponential backoff

💡 System Design Tips

Always start with requirements — don't jump to solutions
No single "correct" design — trade-offs matter
Think about: What happens when X fails?
Read-heavy? → Cache aggressively, read replicas
Write-heavy? → Message queues, async processing
Global users? → CDN, multi-region, eventual consistency