System Design Cheatsheet

Scalability, load balancing, caching, databases, microservices & distributed systems fundamentals

Core CS + Interview
Contents
๐ŸŽฏ

System Design Approach

A structured framework for tackling any system design question in 45-60 minutes.

๐Ÿ’ก 4-Step Framework
  • Step 1 โ€” Requirements (5 min): Functional (what does it do?) & Non-functional (scale, latency, availability)
  • Step 2 โ€” Estimation (5 min): Users, QPS, storage, bandwidth
  • Step 3 โ€” High-Level Design (15 min): Draw main components (client, LB, servers, DB, cache)
  • Step 4 โ€” Deep Dive (20 min): Discuss trade-offs, bottlenecks, database schema, APIs

Questions to Ask

๐Ÿ“ˆ

Scaling

Vertical Scaling (Scale Up)

  • Add more CPU/RAM/disk to one machine
  • Simple โ€” no code changes
  • Limited by hardware ceiling
  • Single point of failure
  • Good for: databases, early-stage apps

Horizontal Scaling (Scale Out)

  • Add more machines to the pool
  • Complex โ€” needs load balancing, state management
  • Virtually unlimited scaling
  • Better fault tolerance
  • Good for: web servers, microservices
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ High-Level Architecture โ”‚ โ”‚ โ”‚ โ”‚ Clients โ†’ DNS โ†’ CDN โ†’ Load Balancer โ†’ App Servers (N) โ”‚ โ”‚ โ†“ โ†“ โ”‚ โ”‚ Cache (Redis) Message Queue โ”‚ โ”‚ โ†“ โ†“ โ”‚ โ”‚ Database Workers (N) โ”‚ โ”‚ (Primary + Replicas) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โš–๏ธ

Load Balancers

Distributes incoming traffic across multiple servers. Sits between clients and backend servers.

Types

TypeLayerHow it works
L4 (Transport)TCP/UDPRoutes based on IP + port. Fast, no content inspection. (e.g., AWS NLB)
L7 (Application)HTTPRoutes based on URL, headers, cookies. Can do SSL termination. (e.g., Nginx, ALB)

Algorithms

AlgorithmDescriptionBest For
Round RobinRequests distributed sequentiallyEqual-capacity servers
Weighted Round RobinProportional to server weightDifferent server specs
Least ConnectionsSend to server with fewest active connectionsLong-lived connections
IP HashHash client IP to pick serverSession affinity (sticky sessions)
Consistent HashingMinimize re-mapping when servers added/removedDistributed caches, DB sharding
๐Ÿ”‘ Key Points
  • Use health checks to remove unhealthy servers
  • Use active-passive LB pairs for high availability
  • SSL termination at LB reduces server load
โšก

Caching

Store frequently accessed data in fast storage (memory) to reduce latency and database load.

Cache Strategies

StrategyReadWriteUse Case
Cache-AsideApp checks cache โ†’ miss โ†’ read DB โ†’ populate cacheApp writes to DB, invalidates cacheGeneral purpose, most common
Read-ThroughCache checks DB on miss automaticallySame as cache-asideSimpler app code
Write-ThroughSame as cache-asideWrite to cache + DB synchronouslyStrong consistency needed
Write-BehindSame as cache-asideWrite to cache, async write to DBWrite-heavy, eventual consistency OK

Eviction Policies

โš ๏ธ Cache Problems
  • Cache stampede: Many requests hit DB when cache expires โ†’ Use lock/semaphore or stale-while-revalidate
  • Cache inconsistency: DB updated but cache stale โ†’ Use short TTL + invalidation
  • Hot key: One key gets massive traffic โ†’ Replicate across nodes or use local cache

Where to Cache

๐ŸŒ

CDN (Content Delivery Network)

Network of edge servers geographically distributed to serve content closer to users.

Push CDN

  • You upload content to CDN
  • Content available before first request
  • You control what's cached
  • Good for: rarely changing content

Pull CDN

  • CDN fetches from origin on first request
  • Lazy population โ€” less storage needed
  • First request is slow (cache miss)
  • Good for: dynamic, frequently changing content

Common CDNs: CloudFront (AWS), Cloudflare, Fastly, Akamai

๐Ÿ—„๏ธ

Database Design

SQL (Relational)

  • Structured schema, tables, rows
  • ACID transactions
  • JOIN support
  • Strong consistency
  • Vertical scaling (primarily)
  • Use for: Financial data, user accounts, relational data
  • Examples: PostgreSQL, MySQL

NoSQL

  • Flexible/schemaless
  • BASE (Basically Available, Soft state, Eventually consistent)
  • No complex JOINs
  • Eventual consistency (usually)
  • Horizontal scaling (built-in)
  • Use for: Real-time analytics, social feeds, IoT data
  • Examples: MongoDB, Cassandra, DynamoDB

NoSQL Types

TypeModelExamplesBest For
Key-ValueSimple keyโ†’value pairsRedis, DynamoDBCaching, sessions, config
DocumentJSON-like documentsMongoDB, CouchDBFlexible schemas, CMS, catalogs
Column-FamilyColumns grouped in familiesCassandra, HBaseTime-series, analytics, heavy writes
GraphNodes + edgesNeo4j, NeptuneSocial networks, recommendation engines
๐Ÿ”บ

CAP Theorem

In a distributed system, you can only guarantee two out of three:

PropertyMeaning
C โ€” ConsistencyEvery read gets the most recent write (all nodes see same data)
A โ€” AvailabilityEvery request gets a response (even if it's not the latest data)
P โ€” Partition ToleranceSystem continues operating despite network partitions between nodes

Since network partitions are inevitable, you really choose between:

๐Ÿ”‘ In practice: Most systems aren't purely CP or AP โ€” they make trade-offs per operation. For example, a bank transfer needs CP, but a social media feed can use AP.
๐Ÿ”„

Consistency Patterns

PatternDescriptionTrade-off
Strong ConsistencyRead always returns latest write. All replicas synchronized.Higher latency, lower throughput
Eventual ConsistencyReplicas converge over time. Reads may be stale temporarily.Lower latency, higher availability
Read-Your-WritesUser always sees their own writes immediately.Balanced โ€” good UX without full consistency
Causal ConsistencyOperations that are causally related are seen in same order by all.Weaker than strong but preserves logical order
๐Ÿ”ช

Partitioning & Sharding

Split data across multiple databases/servers to handle more data than a single machine can.

Strategies

StrategyHowProsCons
Hash-basedshard = hash(key) % NEven distributionHard to add/remove shards (re-hashing)
Range-basedUsers A-M โ†’ Shard 1, N-Z โ†’ Shard 2Simple, range queries workHot spots if distribution is uneven
Consistent HashingHash ring, minimal remappingEasy to add/remove nodesMore complex implementation
Geo-basedData by geographic regionLow latency for local usersCross-region queries are expensive
โš ๏ธ Sharding Challenges
  • Cross-shard joins are very expensive โ†’ denormalize data
  • Rebalancing when adding shards is complex
  • Hot shards if data isn't evenly distributed (celebrity problem)
  • Consider: do you really need sharding? Try read replicas + caching first.
๐Ÿ“‹

Replication

Copy data across multiple servers for availability, fault tolerance, and read scalability.

TypeHowConsistencyUse Case
Single-LeaderOne primary handles writes, replicas handle readsStrong (sync) or eventual (async)Most common: PostgreSQL, MySQL
Multi-LeaderMultiple nodes accept writesConflict resolution neededMulti-datacenter, offline-first apps
LeaderlessAny node can accept reads/writes (quorum-based)Configurable (R + W > N)Cassandra, DynamoDB

Quorum reads/writes: With N replicas, write to W nodes, read from R nodes. If R + W > N, you're guaranteed to read latest write.

๐Ÿ“จ

Message Queues

Decouple producers and consumers. Enable async processing, load leveling, and fault tolerance.

Producer โ†’ [ Message Queue ] โ†’ Consumer(s) (Kafka, RabbitMQ, SQS) Benefits: โ€ข Decoupling: producer doesn't wait for consumer โ€ข Buffering: handle traffic spikes gracefully โ€ข Retry: failed messages can be re-processed โ€ข Ordering: Kafka guarantees order within a partition

Queue vs Pub/Sub

FeatureMessage QueuePub/Sub
ModelPoint-to-pointBroadcast to all subscribers
ConsumerOne consumer per messageAll subscribers get every message
Use caseTask processing, work distributionEvent notifications, real-time updates
ExamplesSQS, RabbitMQKafka topics, SNS, Redis Pub/Sub
๐Ÿงฉ

Microservices

Monolith

  • Single deployable unit
  • Shared database
  • Simple deployment & debugging
  • All-or-nothing scaling
  • Good for: small teams, MVPs, simple apps

Microservices

  • Multiple independent services
  • Own database per service
  • Independent deployment & scaling
  • Complex: needs service discovery, distributed tracing
  • Good for: large teams, complex domains, scale needs

Communication Patterns

PatternTypeDescription
REST/HTTPSynchronousSimple, widely understood. Tight coupling.
gRPCSynchronousBinary protocol, auto-generated clients. Faster than REST.
Message QueueAsynchronousDecoupled, resilient. More complex debugging.
Event SourcingAsynchronousStore events as source of truth. Full audit trail.

Key Patterns

๐Ÿ”Œ

API Design

StyleData FormatBest For
RESTJSON over HTTPCRUD APIs, web apps, public APIs
GraphQLQuery languageFlexible queries, mobile apps (reduce over-fetching)
gRPCProtocol BuffersInternal microservice communication, streaming
WebSocketFull-duplexReal-time: chat, live updates, gaming

REST Best Practices

# Use nouns, not verbs
GET    /api/v1/users          # list users
GET    /api/v1/users/123      # get user 123
POST   /api/v1/users          # create user
PUT    /api/v1/users/123      # update user 123 (full replace)
PATCH  /api/v1/users/123      # partial update
DELETE /api/v1/users/123      # delete user 123

# Pagination
GET /api/v1/users?page=2&limit=20
GET /api/v1/users?cursor=abc123&limit=20   # cursor-based (better for large datasets)

# Filtering & Sorting
GET /api/v1/users?status=active&sort=-created_at

# Status codes
200 OK              # Success
201 Created         # Resource created
204 No Content      # Success, no body (DELETE)
400 Bad Request     # Invalid input
401 Unauthorized    # Not authenticated
403 Forbidden       # Not authorized
404 Not Found       # Resource doesn't exist
429 Too Many Requests  # Rate limited
500 Internal Error  # Server bug
๐Ÿšฆ

Rate Limiting

Protect services from abuse and ensure fair usage.

AlgorithmHow It WorksPros / Cons
Token BucketBucket fills with tokens at fixed rate. Each request consumes a token.Allows bursts. Most common (AWS, Stripe).
Leaky BucketRequests queue and are processed at a fixed rate.Smooth output. No bursts allowed.
Fixed WindowCount requests per time window (e.g., 100/minute).Simple. Boundary spikes possible.
Sliding Window LogTrack timestamps of all requests. Count within window.Accurate but memory-heavy.
Sliding Window CounterCombine fixed windows with weighted count.Good balance of accuracy & efficiency.
๐Ÿงฎ

Back-of-Envelope Math

Latency Numbers

OperationLatency
L1 cache reference~1 ns
L2 cache reference~4 ns
RAM reference~100 ns
SSD random read~16 ฮผs
HDD seek~2 ms
Network round trip (same datacenter)~0.5 ms
Network round trip (cross-continent)~150 ms

Storage Units

UnitSizeExample
1 KB10ยณ bytesA short text paragraph
1 MB10โถ bytesA high-res photo
1 GB10โน bytesA movie (720p)
1 TB10ยนยฒ bytes~500 hours of HD video
1 PB10ยนโต bytes~500 billion pages of text

Quick Estimations

# Daily Active Users (DAU) to QPS
DAU = 10M users
Requests per user per day = 10
Total daily requests = 100M
QPS = 100M / 86400 โ‰ˆ 1,157 QPS
Peak QPS โ‰ˆ 2ร— average โ‰ˆ 2,300 QPS

# Storage estimation
10M users ร— 1 KB per user = 10 GB (user data)
1M posts/day ร— 5 KB ร— 365 days = 1.8 TB/year

# Bandwidth
QPS ร— avg response size = bandwidth
1000 QPS ร— 10 KB = 10 MB/s = 80 Mbps
๐Ÿ—๏ธ

Common System Designs

URL Shortener (bit.ly)

Chat System (WhatsApp)

News Feed (Twitter/Instagram)

Rate Limiter

Notification System

๐Ÿ’ก System Design Tips
  • Always start with requirements โ€” don't jump to solutions
  • No single "correct" design โ€” trade-offs matter
  • Think about: What happens when X fails?
  • Read-heavy? โ†’ Cache aggressively, read replicas
  • Write-heavy? โ†’ Message queues, async processing
  • Global users? โ†’ CDN, multi-region, eventual consistency