NoSQL Cheatsheet

MongoDB, Redis, DynamoDB β€” CRUD, data modeling, indexing, caching patterns & when to use what

Database / NoSQL
Contents
πŸ“¦

NoSQL Overview

NoSQL (Not Only SQL) databases are designed for specific data models and have flexible schemas. They excel at scalability, performance, and handling unstructured or semi-structured data.

When to Use NoSQL

πŸ—‚οΈ

NoSQL Types

TypeDescriptionExamplesBest For
DocumentJSON/BSON documents with nested structuresMongoDB, CouchDB, FirestoreContent management, catalogs, user profiles
Key-ValueSimple key β†’ value pairs, extremely fastRedis, DynamoDB, etcdCaching, sessions, config, rate limiting
Column-FamilyData stored in column families, wide rowsCassandra, HBase, ScyllaDBTime-series, IoT, analytics at scale
GraphNodes + edges with propertiesNeo4j, Amazon NeptuneSocial networks, recommendations, fraud detection
Search EngineFull-text search with inverted indexesElasticsearch, SolrLog analysis, search, autocomplete
βš–οΈ

SQL vs NoSQL

AspectSQL (PostgreSQL, MySQL)NoSQL (MongoDB, Redis)
SchemaFixed schema, migrations neededFlexible / schema-less
Query LanguageSQL (standardized)Database-specific APIs
RelationshipsJOINs β€” normalized dataEmbedded / denormalized data
ScalingVertical (scale up)Horizontal (scale out / sharding)
TransactionsStrong ACIDVaries (eventual to strong consistency)
Data ModelTables with rows & columnsDocuments, key-value, columns, graphs
Best ForComplex queries, relationships, reportingHigh scale, flexible data, specific access patterns
πŸ’‘ Not either/or! Many modern systems use polyglot persistence β€” SQL for transactions, Redis for caching, Elasticsearch for search, MongoDB for flexible documents.
πŸƒ

MongoDB Basics

MongoDB stores data as BSON documents (binary JSON) in collections (β‰ˆ tables).

Terminology Mapping

SQLMongoDB
DatabaseDatabase
TableCollection
RowDocument
ColumnField
Primary Key_id (auto-generated ObjectId)
JOIN$lookup (aggregation) or embed
IndexIndex
// Connect via mongosh
mongosh "mongodb://localhost:27017/mydb"

// Basic shell commands
show dbs                     // list databases
use mydb                     // switch database
show collections             // list collections
db.users.stats()             // collection statistics
db.dropDatabase()            // drop current database
✏️

MongoDB CRUD

// ── INSERT ──
db.users.insertOne({
    name: "Alice",
    email: "alice@example.com",
    age: 30,
    tags: ["admin", "developer"],
    address: { city: "NYC", zip: "10001" }
});

db.users.insertMany([
    { name: "Bob", age: 25 },
    { name: "Charlie", age: 35 }
]);

// ── READ ──
db.users.find({});                          // all documents
db.users.findOne({ email: "alice@example.com" });
db.users.find({ age: { $gte: 25 } });       // age >= 25
db.users.find({}, { name: 1, email: 1 });   // projection (only name & email)
db.users.find({}).sort({ age: -1 }).limit(10);

// ── UPDATE ──
db.users.updateOne(
    { email: "alice@example.com" },
    { $set: { age: 31, role: "admin" } }
);

db.users.updateMany(
    { age: { $lt: 18 } },
    { $set: { status: "minor" } }
);

// Upsert (insert if not found)
db.users.updateOne(
    { email: "new@user.com" },
    { $set: { name: "New User" } },
    { upsert: true }
);

// ── DELETE ──
db.users.deleteOne({ email: "bob@example.com" });
db.users.deleteMany({ age: { $lt: 18 } });
db.users.drop();  // delete entire collection
πŸ”

MongoDB Query Operators

Comparison Operators

OperatorDescriptionExample
$eqEqual{ age: { $eq: 25 } }
$neNot equal{ status: { $ne: "inactive" } }
$gt / $gteGreater than / >={ age: { $gte: 18 } }
$lt / $lteLess than / <={ price: { $lt: 100 } }
$inIn array{ role: { $in: ["admin", "mod"] } }
$ninNot in array{ role: { $nin: ["banned"] } }

Logical Operators

// AND (implicit)
db.users.find({ age: { $gte: 18 }, role: "admin" });

// OR
db.users.find({ $or: [{ age: { $lt: 18 } }, { role: "minor" }] });

// NOT
db.users.find({ age: { $not: { $lt: 18 } } });

// Array queries
db.users.find({ tags: "admin" });                    // has "admin" in array
db.users.find({ tags: { $all: ["admin", "dev"] } }); // has both
db.users.find({ tags: { $size: 3 } });                // array length = 3
db.users.find({ "tags.0": "admin" });                 // first element

// Nested document query
db.users.find({ "address.city": "NYC" });

// Exists
db.users.find({ phone: { $exists: true } });

// Regex
db.users.find({ name: { $regex: /^ali/i } });
πŸ“Š

MongoDB Aggregation Pipeline

The aggregation pipeline processes documents through a sequence of stages β€” similar to Unix pipes.

db.orders.aggregate([
    // Stage 1: Filter
    { $match: { status: "completed" } },

    // Stage 2: Group by user and calculate stats
    { $group: {
        _id: "$user_id",
        totalSpent: { $sum: "$total" },
        orderCount: { $sum: 1 },
        avgOrder:   { $avg: "$total" }
    }},

    // Stage 3: Sort by total spent descending
    { $sort: { totalSpent: -1 } },

    // Stage 4: Limit to top 10
    { $limit: 10 },

    // Stage 5: Join with users collection
    { $lookup: {
        from: "users",
        localField: "_id",
        foreignField: "_id",
        as: "user"
    }},

    // Stage 6: Reshape output
    { $project: {
        userName: { $arrayElemAt: ["$user.name", 0] },
        totalSpent: 1,
        orderCount: 1,
        avgOrder: { $round: ["$avgOrder", 2] }
    }}
]);

Common Stages

StagePurpose
$matchFilter documents (like WHERE)
$groupGroup by field, apply accumulators ($sum, $avg, $min, $max, $push)
$sortSort documents
$projectReshape: include/exclude/compute fields
$lookupLeft outer join with another collection
$unwindDeconstruct array field β†’ one doc per element
$limit / $skipPagination
$addFieldsAdd computed fields
$facetRun multiple pipelines in parallel on same input
⚑

MongoDB Indexes

// Single field index
db.users.createIndex({ email: 1 });              // ascending
db.users.createIndex({ email: 1 }, { unique: true });

// Compound index
db.orders.createIndex({ user_id: 1, created_at: -1 });

// Text index (full-text search)
db.articles.createIndex({ title: "text", body: "text" });
db.articles.find({ $text: { $search: "mongodb nosql" } });

// TTL index (auto-delete after expiry)
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 3600 });

// Partial index
db.users.createIndex(
    { email: 1 },
    { partialFilterExpression: { status: "active" } }
);

// View indexes
db.users.getIndexes();

// Explain query (check if index is used)
db.users.find({ email: "a@b.com" }).explain("executionStats");
πŸ—οΈ

MongoDB Schema Design

Embed vs Reference

Embed (Denormalize)

  • Data accessed together
  • One-to-few relationship
  • Data doesn't change often
  • Single read β€” no joins needed
  • Document size < 16 MB limit

Reference (Normalize)

  • Data accessed independently
  • One-to-many / many-to-many
  • Data changes frequently
  • Avoids duplication
  • Needs $lookup for joins
// ── Embedded pattern (blog post with comments) ──
{
    _id: ObjectId("..."),
    title: "MongoDB Guide",
    author: "Alice",
    comments: [
        { user: "Bob", text: "Great post!", date: ISODate("2024-01-15") },
        { user: "Charlie", text: "Very helpful", date: ISODate("2024-01-16") }
    ]
}

// ── Referenced pattern (users and orders) ──
// users collection
{ _id: ObjectId("u1"), name: "Alice", email: "alice@ex.com" }

// orders collection
{ _id: ObjectId("o1"), user_id: ObjectId("u1"), total: 99.99 }

Design Patterns

PatternDescriptionUse Case
BucketGroup related data into buckets (e.g., by time)IoT data, time-series
ComputedPre-compute aggregations on writeDashboards, leaderboards
SubsetStore frequent fields in main doc, rest in separateProduct listings (summary + details)
PolymorphicDifferent doc structures in same collectionMixed content types
πŸ”΄

Redis Basics

Redis is an in-memory key-value data store. It's used for caching, sessions, real-time analytics, queues, pub/sub, and more. Extremely fast β€” supports ~100k+ ops/sec.

# Connect
redis-cli -h localhost -p 6379

# Basic string operations
SET user:1:name "Alice"
GET user:1:name                  # β†’ "Alice"
SET session:abc123 "data" EX 3600   # expires in 1 hour
TTL session:abc123               # seconds remaining
DEL user:1:name                  # delete key
EXISTS user:1:name               # 1 if exists, 0 if not

# Atomic increment (perfect for counters)
INCR page:views                  # increment by 1
INCRBY page:views 10             # increment by 10

# Multiple operations
MSET k1 "v1" k2 "v2" k3 "v3"
MGET k1 k2 k3

# Key patterns
KEYS user:*                      # find matching keys (DON'T use in production!)
SCAN 0 MATCH user:* COUNT 100    # safe cursor-based iteration
πŸ“Š

Redis Data Types

# ── HASH (like a mini document) ──
HSET user:1 name "Alice" age 30 email "alice@ex.com"
HGET user:1 name                  # β†’ "Alice"
HGETALL user:1                    # β†’ all fields & values
HINCRBY user:1 age 1              # β†’ 31

# ── LIST (ordered, duplicates allowed) ──
LPUSH queue:tasks "task1" "task2"  # push left
RPUSH queue:tasks "task3"          # push right
LPOP  queue:tasks                   # pop from left
RPOP  queue:tasks                   # pop from right
LRANGE queue:tasks 0 -1             # all elements
LLEN queue:tasks                    # length

# ── SET (unique, unordered) ──
SADD tags:article:1 "redis" "nosql" "database"
SMEMBERS tags:article:1             # all members
SISMEMBER tags:article:1 "redis"   # 1 if member, 0 if not
SINTER tags:article:1 tags:article:2  # intersection

# ── SORTED SET (unique, scored & ordered) ──
ZADD leaderboard 100 "Alice" 85 "Bob" 92 "Charlie"
ZRANGE leaderboard 0 -1 WITHSCORES   # ascending
ZREVRANGE leaderboard 0 2             # top 3 descending
ZRANK leaderboard "Alice"               # rank (0-based)
ZINCRBY leaderboard 5 "Alice"         # increment score
TypeUse Case
StringCaching, counters, session tokens
HashUser profiles, object storage
ListMessage queues, activity feeds, recent items
SetTags, unique visitors, social connections
Sorted SetLeaderboards, priority queues, rate limiters
StreamEvent sourcing, log processing
🧩

Redis Patterns

Cache-Aside (Lazy Loading)

# Pseudocode: check cache first, then DB
def get_user(user_id):
    # 1. Check cache
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)

    # 2. Cache miss β†’ fetch from DB
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)

    # 3. Store in cache with TTL
    redis.setex(f"user:{user_id}", 3600, json.dumps(user))
    return user

Pub/Sub

# Publisher
PUBLISH notifications "New order received!"

# Subscriber (in another terminal)
SUBSCRIBE notifications
# Blocks and receives messages in real-time

Rate Limiting (Sliding Window)

# Using sorted set for sliding window rate limiter
def is_rate_limited(user_id, limit=100, window=60):
    key = f"ratelimit:{user_id}"
    now = time.time()

    pipe = redis.pipeline()
    pipe.zremrangebyscore(key, 0, now - window)  # remove old entries
    pipe.zadd(key, {str(now): now})                # add current request
    pipe.zcard(key)                                 # count requests
    pipe.expire(key, window)                        # set TTL
    _, _, count, _ = pipe.execute()

    return count > limit

Persistence Options

OptionHowTrade-off
RDBPoint-in-time snapshots at intervalsFast recovery, may lose recent data
AOFAppend every write operation to logMore durable, larger file, slower recovery
RDB + AOFBoth combinedBest durability, recommended for production
⚑

DynamoDB

AWS managed NoSQL key-value and document database. Designed for single-digit millisecond latency at any scale.

Core Concepts

ConceptDescription
TableCollection of items (similar to a SQL table)
ItemA single record (like a row)
Partition Key (PK)Primary key β€” determines which partition stores the item
Sort Key (SK)Optional β€” allows multiple items per partition key (range queries)
GSIGlobal Secondary Index β€” different PK/SK for alternative access
LSILocal Secondary Index β€” same PK, different SK

Access Patterns

# DynamoDB is access-pattern-driven!
# Design your table around HOW you'll query it

# Example: E-commerce orders
# PK: USER#alice    SK: ORDER#2024-01-15#001
# PK: USER#alice    SK: PROFILE
# PK: USER#alice    SK: ADDRESS#home

# Single-table design: multiple entity types in one table
# Enables fetching related data in a single query

# Query: Get all orders for a user
table.query(
    KeyConditionExpression="PK = :pk AND begins_with(SK, :prefix)",
    ExpressionAttributeValues={
        ":pk": "USER#alice",
        ":prefix": "ORDER#"
    }
)
πŸ’‘ DynamoDB Design Rules
  • Know your access patterns first β€” design the schema around queries
  • Single-table design β€” store multiple entity types in one table
  • Avoid scans β€” queries must use PK, with optional SK conditions
  • Use GSIs for alternative query patterns
🎯

Choosing the Right Database

RequirementBest ChoiceWhy
Complex queries, relationshipsPostgreSQLRich SQL, JOINs, ACID
Flexible documents, rapid iterationMongoDBSchema-less, easy to start
Caching, sessions, real-timeRedisIn-memory speed, rich data types
Massive scale, simple key accessDynamoDBManaged, auto-scaling, predictable
Time-series, high write throughputCassandraDistributed writes, linear scaling
Full-text search, analyticsElasticsearchInverted indexes, aggregation
Social graphs, relationshipsNeo4jNative graph traversal
πŸ’‘ Interview Tip: In system design interviews, justify your database choice:
  • "I'd use PostgreSQL for the core transactional data because we need ACID…"
  • "…with Redis in front for caching hot queries to reduce latency…"
  • "…and Elasticsearch for the search feature since we need full-text search with fuzzy matching."