NoSQL Cheatsheet

MongoDB, Redis, DynamoDB — CRUD, data modeling, indexing, caching patterns & when to use what

Database / NoSQL

Contents

NoSQL Overview NoSQL Types SQL vs NoSQL MongoDB Basics MongoDB CRUD MongoDB Queries MongoDB Aggregation MongoDB Indexes MongoDB Schema Design Redis Basics Redis Data Types Redis Patterns DynamoDB Choosing the Right DB

📦

NoSQL Overview

NoSQL (Not Only SQL) databases are designed for specific data models and have flexible schemas. They excel at scalability, performance, and handling unstructured or semi-structured data.

When to Use NoSQL

Flexible schema: Data structure changes frequently
Horizontal scaling: Need to scale across many servers
High throughput: Millions of reads/writes per second
Specific access patterns: Key-value lookups, time-series, graphs
Denormalized data: Embedding related data to avoid JOINs

🗂️

NoSQL Types

Type	Description	Examples	Best For
Document	JSON/BSON documents with nested structures	MongoDB, CouchDB, Firestore	Content management, catalogs, user profiles
Key-Value	Simple key → value pairs, extremely fast	Redis, DynamoDB, etcd	Caching, sessions, config, rate limiting
Column-Family	Data stored in column families, wide rows	Cassandra, HBase, ScyllaDB	Time-series, IoT, analytics at scale
Graph	Nodes + edges with properties	Neo4j, Amazon Neptune	Social networks, recommendations, fraud detection
Search Engine	Full-text search with inverted indexes	Elasticsearch, Solr	Log analysis, search, autocomplete

⚖️

SQL vs NoSQL

Aspect	SQL (PostgreSQL, MySQL)	NoSQL (MongoDB, Redis)
Schema	Fixed schema, migrations needed	Flexible / schema-less
Query Language	SQL (standardized)	Database-specific APIs
Relationships	JOINs — normalized data	Embedded / denormalized data
Scaling	Vertical (scale up)	Horizontal (scale out / sharding)
Transactions	Strong ACID	Varies (eventual to strong consistency)
Data Model	Tables with rows & columns	Documents, key-value, columns, graphs
Best For	Complex queries, relationships, reporting	High scale, flexible data, specific access patterns

💡 Not either/or! Many modern systems use polyglot persistence — SQL for transactions, Redis for caching, Elasticsearch for search, MongoDB for flexible documents.

🍃

MongoDB Basics

MongoDB stores data as BSON documents (binary JSON) in collections (≈ tables).

Terminology Mapping

SQL	MongoDB
Database	Database
Table	Collection
Row	Document
Column	Field
Primary Key	`_id` (auto-generated ObjectId)
JOIN	`$lookup` (aggregation) or embed
Index	Index

// Connect via mongosh
mongosh "mongodb://localhost:27017/mydb"

// Basic shell commands
show dbs                     // list databases
use mydb                     // switch database
show collections             // list collections
db.users.stats()             // collection statistics
db.dropDatabase()            // drop current database

✏️

MongoDB CRUD

// ── INSERT ──
db.users.insertOne({
    name: "Alice",
    email: "alice@example.com",
    age: 30,
    tags: ["admin", "developer"],
    address: { city: "NYC", zip: "10001" }
});

db.users.insertMany([
    { name: "Bob", age: 25 },
    { name: "Charlie", age: 35 }
]);

// ── READ ──
db.users.find({});                          // all documents
db.users.findOne({ email: "alice@example.com" });
db.users.find({ age: { $gte: 25 } });       // age >= 25
db.users.find({}, { name: 1, email: 1 });   // projection (only name & email)
db.users.find({}).sort({ age: -1 }).limit(10);

// ── UPDATE ──
db.users.updateOne(
    { email: "alice@example.com" },
    { $set: { age: 31, role: "admin" } }
);

db.users.updateMany(
    { age: { $lt: 18 } },
    { $set: { status: "minor" } }
);

// Upsert (insert if not found)
db.users.updateOne(
    { email: "new@user.com" },
    { $set: { name: "New User" } },
    { upsert: true }
);

// ── DELETE ──
db.users.deleteOne({ email: "bob@example.com" });
db.users.deleteMany({ age: { $lt: 18 } });
db.users.drop();  // delete entire collection

🔍

MongoDB Query Operators

Comparison Operators

Operator	Description	Example
`$eq`	Equal	`{ age: { $eq: 25 } }`
`$ne`	Not equal	`{ status: { $ne: "inactive" } }`
`$gt / $gte`	Greater than / >=	`{ age: { $gte: 18 } }`
`$lt / $lte`	Less than / <=	`{ price: { $lt: 100 } }`
`$in`	In array	`{ role: { $in: ["admin", "mod"] } }`
`$nin`	Not in array	`{ role: { $nin: ["banned"] } }`

Logical Operators

// AND (implicit)
db.users.find({ age: { $gte: 18 }, role: "admin" });

// OR
db.users.find({ $or: [{ age: { $lt: 18 } }, { role: "minor" }] });

// NOT
db.users.find({ age: { $not: { $lt: 18 } } });

// Array queries
db.users.find({ tags: "admin" });                    // has "admin" in array
db.users.find({ tags: { $all: ["admin", "dev"] } }); // has both
db.users.find({ tags: { $size: 3 } });                // array length = 3
db.users.find({ "tags.0": "admin" });                 // first element

// Nested document query
db.users.find({ "address.city": "NYC" });

// Exists
db.users.find({ phone: { $exists: true } });

// Regex
db.users.find({ name: { $regex: /^ali/i } });

📊

MongoDB Aggregation Pipeline

The aggregation pipeline processes documents through a sequence of stages — similar to Unix pipes.

db.orders.aggregate([
    // Stage 1: Filter
    { $match: { status: "completed" } },

    // Stage 2: Group by user and calculate stats
    { $group: {
        _id: "$user_id",
        totalSpent: { $sum: "$total" },
        orderCount: { $sum: 1 },
        avgOrder:   { $avg: "$total" }
    }},

    // Stage 3: Sort by total spent descending
    { $sort: { totalSpent: -1 } },

    // Stage 4: Limit to top 10
    { $limit: 10 },

    // Stage 5: Join with users collection
    { $lookup: {
        from: "users",
        localField: "_id",
        foreignField: "_id",
        as: "user"
    }},

    // Stage 6: Reshape output
    { $project: {
        userName: { $arrayElemAt: ["$user.name", 0] },
        totalSpent: 1,
        orderCount: 1,
        avgOrder: { $round: ["$avgOrder", 2] }
    }}
]);

Common Stages

Stage	Purpose
`$match`	Filter documents (like WHERE)
`$group`	Group by field, apply accumulators ($sum, $avg, $min, $max, $push)
`$sort`	Sort documents
`$project`	Reshape: include/exclude/compute fields
`$lookup`	Left outer join with another collection
`$unwind`	Deconstruct array field → one doc per element
`$limit / $skip`	Pagination
`$addFields`	Add computed fields
`$facet`	Run multiple pipelines in parallel on same input

⚡

MongoDB Indexes

// Single field index
db.users.createIndex({ email: 1 });              // ascending
db.users.createIndex({ email: 1 }, { unique: true });

// Compound index
db.orders.createIndex({ user_id: 1, created_at: -1 });

// Text index (full-text search)
db.articles.createIndex({ title: "text", body: "text" });
db.articles.find({ $text: { $search: "mongodb nosql" } });

// TTL index (auto-delete after expiry)
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 3600 });

// Partial index
db.users.createIndex(
    { email: 1 },
    { partialFilterExpression: { status: "active" } }
);

// View indexes
db.users.getIndexes();

// Explain query (check if index is used)
db.users.find({ email: "a@b.com" }).explain("executionStats");

🏗️

MongoDB Schema Design

Embed vs Reference

Embed (Denormalize)

Data accessed together
One-to-few relationship
Data doesn't change often
Single read — no joins needed
Document size < 16 MB limit

Reference (Normalize)

Data accessed independently
One-to-many / many-to-many
Data changes frequently
Avoids duplication
Needs $lookup for joins

// ── Embedded pattern (blog post with comments) ──
{
    _id: ObjectId("..."),
    title: "MongoDB Guide",
    author: "Alice",
    comments: [
        { user: "Bob", text: "Great post!", date: ISODate("2024-01-15") },
        { user: "Charlie", text: "Very helpful", date: ISODate("2024-01-16") }
    ]
}

// ── Referenced pattern (users and orders) ──
// users collection
{ _id: ObjectId("u1"), name: "Alice", email: "alice@ex.com" }

// orders collection
{ _id: ObjectId("o1"), user_id: ObjectId("u1"), total: 99.99 }

Design Patterns

Pattern	Description	Use Case
Bucket	Group related data into buckets (e.g., by time)	IoT data, time-series
Computed	Pre-compute aggregations on write	Dashboards, leaderboards
Subset	Store frequent fields in main doc, rest in separate	Product listings (summary + details)
Polymorphic	Different doc structures in same collection	Mixed content types

🔴

Redis Basics

Redis is an in-memory key-value data store. It's used for caching, sessions, real-time analytics, queues, pub/sub, and more. Extremely fast — supports ~100k+ ops/sec.

# Connect
redis-cli -h localhost -p 6379

# Basic string operations
SET user:1:name "Alice"
GET user:1:name                  # → "Alice"
SET session:abc123 "data" EX 3600   # expires in 1 hour
TTL session:abc123               # seconds remaining
DEL user:1:name                  # delete key
EXISTS user:1:name               # 1 if exists, 0 if not

# Atomic increment (perfect for counters)
INCR page:views                  # increment by 1
INCRBY page:views 10             # increment by 10

# Multiple operations
MSET k1 "v1" k2 "v2" k3 "v3"
MGET k1 k2 k3

# Key patterns
KEYS user:*                      # find matching keys (DON'T use in production!)
SCAN 0 MATCH user:* COUNT 100    # safe cursor-based iteration

📊

Redis Data Types

# ── HASH (like a mini document) ──
HSET user:1 name "Alice" age 30 email "alice@ex.com"
HGET user:1 name                  # → "Alice"
HGETALL user:1                    # → all fields & values
HINCRBY user:1 age 1              # → 31

# ── LIST (ordered, duplicates allowed) ──
LPUSH queue:tasks "task1" "task2"  # push left
RPUSH queue:tasks "task3"          # push right
LPOP  queue:tasks                   # pop from left
RPOP  queue:tasks                   # pop from right
LRANGE queue:tasks 0 -1             # all elements
LLEN queue:tasks                    # length

# ── SET (unique, unordered) ──
SADD tags:article:1 "redis" "nosql" "database"
SMEMBERS tags:article:1             # all members
SISMEMBER tags:article:1 "redis"   # 1 if member, 0 if not
SINTER tags:article:1 tags:article:2  # intersection

# ── SORTED SET (unique, scored & ordered) ──
ZADD leaderboard 100 "Alice" 85 "Bob" 92 "Charlie"
ZRANGE leaderboard 0 -1 WITHSCORES   # ascending
ZREVRANGE leaderboard 0 2             # top 3 descending
ZRANK leaderboard "Alice"               # rank (0-based)
ZINCRBY leaderboard 5 "Alice"         # increment score

Type	Use Case
String	Caching, counters, session tokens
Hash	User profiles, object storage
List	Message queues, activity feeds, recent items
Set	Tags, unique visitors, social connections
Sorted Set	Leaderboards, priority queues, rate limiters
Stream	Event sourcing, log processing

🧩

Redis Patterns

Cache-Aside (Lazy Loading)

# Pseudocode: check cache first, then DB
def get_user(user_id):
    # 1. Check cache
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)

    # 2. Cache miss → fetch from DB
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)

    # 3. Store in cache with TTL
    redis.setex(f"user:{user_id}", 3600, json.dumps(user))
    return user

Pub/Sub

# Publisher
PUBLISH notifications "New order received!"

# Subscriber (in another terminal)
SUBSCRIBE notifications
# Blocks and receives messages in real-time

Rate Limiting (Sliding Window)

# Using sorted set for sliding window rate limiter
def is_rate_limited(user_id, limit=100, window=60):
    key = f"ratelimit:{user_id}"
    now = time.time()

    pipe = redis.pipeline()
    pipe.zremrangebyscore(key, 0, now - window)  # remove old entries
    pipe.zadd(key, {str(now): now})                # add current request
    pipe.zcard(key)                                 # count requests
    pipe.expire(key, window)                        # set TTL
    _, _, count, _ = pipe.execute()

    return count > limit

Persistence Options

Option	How	Trade-off
RDB	Point-in-time snapshots at intervals	Fast recovery, may lose recent data
AOF	Append every write operation to log	More durable, larger file, slower recovery
RDB + AOF	Both combined	Best durability, recommended for production

⚡

DynamoDB

AWS managed NoSQL key-value and document database. Designed for single-digit millisecond latency at any scale.

Core Concepts

Concept	Description
Table	Collection of items (similar to a SQL table)
Item	A single record (like a row)
Partition Key (PK)	Primary key — determines which partition stores the item
Sort Key (SK)	Optional — allows multiple items per partition key (range queries)
GSI	Global Secondary Index — different PK/SK for alternative access
LSI	Local Secondary Index — same PK, different SK

Access Patterns

# DynamoDB is access-pattern-driven!
# Design your table around HOW you'll query it

# Example: E-commerce orders
# PK: USER#alice    SK: ORDER#2024-01-15#001
# PK: USER#alice    SK: PROFILE
# PK: USER#alice    SK: ADDRESS#home

# Single-table design: multiple entity types in one table
# Enables fetching related data in a single query

# Query: Get all orders for a user
table.query(
    KeyConditionExpression="PK = :pk AND begins_with(SK, :prefix)",
    ExpressionAttributeValues={
        ":pk": "USER#alice",
        ":prefix": "ORDER#"
    }
)

💡 DynamoDB Design Rules

Know your access patterns first — design the schema around queries
Single-table design — store multiple entity types in one table
Avoid scans — queries must use PK, with optional SK conditions
Use GSIs for alternative query patterns

🎯

Choosing the Right Database

Requirement	Best Choice	Why
Complex queries, relationships	PostgreSQL	Rich SQL, JOINs, ACID
Flexible documents, rapid iteration	MongoDB	Schema-less, easy to start
Caching, sessions, real-time	Redis	In-memory speed, rich data types
Massive scale, simple key access	DynamoDB	Managed, auto-scaling, predictable
Time-series, high write throughput	Cassandra	Distributed writes, linear scaling
Full-text search, analytics	Elasticsearch	Inverted indexes, aggregation
Social graphs, relationships	Neo4j	Native graph traversal

💡 Interview Tip: In system design interviews, justify your database choice:

"I'd use PostgreSQL for the core transactional data because we need ACID…"
"…with Redis in front for caching hot queries to reduce latency…"
"…and Elasticsearch for the search feature since we need full-text search with fuzzy matching."