MongoDB:
Beginner to Advanced

📅 Apr 2025 ⏱ 20 min read 🏷

MongoDB is the world's most popular NoSQL document database. This guide takes you from installing MongoDB and writing your first query all the way to indexing strategies, aggregation pipelines, transactions, and production best practices.

What is MongoDB?

MongoDB is a document-oriented NoSQL database that stores data as flexible JSON-like documents called BSON (Binary JSON). Unlike relational databases, there are no rigid schemas — each document in a collection can have a different structure.

It is designed for horizontal scalability, high availability, and developer productivity. You can model one-to-many and many-to-many relationships naturally, embed related data in a single document, or reference across collections — all depending on your access patterns.

💡

Key insight: In MongoDB, a database contains collections, which contain documents. Think: database → table → row in SQL terms, but far more flexible.

This guide progresses through 6 levels: Setup, CRUD, Querying, Indexing, Aggregation, and Advanced topics like transactions and replication.

Level 2 — CRUD Operations

CRUD operations target a single collection at a time. All single-document writes are atomic. MongoDB auto-creates collections on first insert.

CREATE
insertOne() insertMany()

Insert one or multiple documents. MongoDB generates _id automatically if not provided. insertMany() is ordered by default — it stops on first error unless { ordered: false }.

// insertOne — returns insertedId
db.orders.insertOne({
  customerId: ObjectId("64f1a2b3c4d5e6f7a8b9c0d1"),
  items: [{ sku: "ABC", qty: 2, price: 999 }],
  status: "pending",
  total: 1998,
  createdAt: new Date()
})

// insertMany — ordered:false continues on error
db.products.insertMany([
  { name: "Keyboard", price: 1499, stock: 50  },
  { name: "Mouse",    price: 799,  stock: 200 },
  { name: "Monitor",  price: 12999, stock: 15  }
], { ordered: false })
🔍 READ
find() findOne() countDocuments()

Query documents using a filter object. find() returns a cursor — iterate it or chain .toArray(). Use projection to return only needed fields. Dot notation queries nested fields.

// find with filter + projection
db.orders.find(
  { status: "pending", total: { $gt: 1000 } },
  { customerId: 1, total: 1, createdAt: 1, _id: 0 }
)

// Dot notation — nested fields
db.products.find({ "specs.bluetooth": "5.3" })

// Sort + limit + skip (pagination)
db.products.find()
  .sort({ price: -1 })
  .skip(0).limit(10)

// Count matching documents
db.orders.countDocuments({ status: "pending" })
✏️ UPDATE
updateOne() updateMany() findOneAndUpdate() replaceOne()

Modify documents using update operators. $set adds/updates fields. $unset removes fields. $inc increments numbers. $push appends to arrays. Use upsert:true to insert if not found.

// $set — update specific fields only
db.orders.updateOne(
  { _id: ObjectId("...") },
  { $set: { status: "shipped", updatedAt: new Date() } }
)

// $inc — atomic stock decrement
db.products.updateOne(
  { name: "Monitor" },
  { $inc: { stock: -1 } }
)

// $push — append to array
db.users.updateOne(
  { email: "[email protected]" },
  { $push: { loginHistory: new Date() } }
)

// upsert — insert if not found
db.settings.updateOne(
  { userId: "u123" },
  { $set: { theme: "dark", lang: "en" } },
  { upsert: true }
)
🗑️ DELETE
deleteOne() deleteMany() findOneAndDelete()

Remove documents matching a filter. deleteOne() removes the first match. deleteMany({}) empties a collection but keeps its indexes — unlike drop(). findOneAndDelete() returns the deleted document.

// deleteOne — remove first match
db.sessions.deleteOne({ sessionId: "abc123" })

// deleteMany — purge expired sessions
db.sessions.deleteMany({
  expiresAt: { $lt: new Date() }
})

// findOneAndDelete — retrieve doc before removing
const cancelled = await db.orders.findOneAndDelete(
  { _id: ObjectId("...") }
)
console.log(cancelled)

// Drop entire collection (removes indexes too)
db.tempLogs.drop()

Level 6 — Advanced Topics

Production-grade MongoDB: multi-document transactions, schema validation, replica sets, sharding, change streams, and performance tuning.

🔒

Multi-Document ACID Transactions

MongoDB supports distributed ACID transactions across multiple documents, collections, databases, and shards (4.0+ replica sets, 4.2+ sharded clusters). Transactions commit all changes or roll back entirely. Uncommitted changes are never visible outside the transaction. Default runtime limit is 1 minute via transactionLifetimeLimitSeconds.

javascript
// Atomic bank transfer — debit + credit + ledger
const session = client.startSession()

try {
  session.startTransaction({
    readConcern:  { level: "snapshot" },
    writeConcern: { w: "majority" }
  })

  // Debit sender
  await db.accounts.updateOne(
    { _id: senderId, balance: { $gte: amount } },
    { $inc: { balance: -amount } },
    { session }
  )

  // Credit receiver
  await db.accounts.updateOne(
    { _id: receiverId },
    { $inc: { balance: +amount } },
    { session }
  )

  // Append to ledger
  await db.ledger.insertOne(
    { from: senderId, to: receiverId, amount, ts: new Date() },
    { session }
  )

  await session.commitTransaction()

} catch (err) {
  await session.abortTransaction()
} finally {
  await session.endSession()
}
📋

Schema Validation with $jsonSchema

Enforce document structure at the database level. validationAction: 'error' rejects invalid writes. 'warn' logs without rejecting. validationLevel: 'strict' applies to all inserts and updates. 'moderate' applies only to inserts and updates of currently valid documents.

javascript
db.createCollection("orders", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["customerId", "items", "total", "status"],
      properties: {
        customerId: { bsonType: "objectId" },
        items: {
          bsonType: "array", minItems: 1,
          items: {
            required: ["sku", "qty", "price"],
            properties: {
              sku:   { bsonType: "string" },
              qty:   { bsonType: "int", minimum: 1 },
              price: { bsonType: "double", minimum: 0 }
            }
          }
        },
        status: {
          bsonType: "string",
          enum: ["pending", "processing", "shipped", "completed", "cancelled"]
        }
      }
    }
  },
  validationAction: "error",
  validationLevel:  "strict"
})
🔁

Replica Sets & High Availability

A replica set is a group of mongod processes maintaining the same dataset. One primary accepts all writes; secondaries replicate the primary's oplog asynchronously. On primary failure, an eligible secondary holds an election automatically. All production deployments must use replica sets. Reads can be distributed to secondaries with read preferences.

javascript
// Initiate a 3-member replica set
rs.initiate({
  _id: "rs0",
  members: [
    { _id: 0, host: "mongo1:27017", priority: 2 },
    { _id: 1, host: "mongo2:27017", priority: 1 },
    { _id: 2, host: "mongo3:27017", priority: 1 }
  ]
})

// Check status
rs.status()

// Read from nearest secondary (Node.js driver)
const client = new MongoClient(uri, {
  readPreference: "secondaryPreferred"
})

// Write concern — majority acknowledgement + journal
db.orders.insertOne(doc, {
  writeConcern: { w: "majority", j: true }
})
🌐

Sharding — Horizontal Scaling

Sharding distributes data across multiple shards (each a replica set) using a shard key. The mongos router directs queries to the appropriate shards. Config servers store cluster metadata. Choose a shard key with high cardinality and even distribution to avoid hotspots. Once sharded, a collection cannot be unsharded.

javascript
// Enable sharding on a database
sh.enableSharding("ecommerce")

// Hashed shard key — even distribution
sh.shardCollection(
  "ecommerce.orders",
  { customerId: "hashed" }
)

// Range shard key — good for time-series range queries
sh.shardCollection(
  "ecommerce.events",
  { createdAt: 1 }
)

// Check sharding status
sh.status()
sh.getBalancerState()
📡

Change Streams — Real-Time Events

Change streams (MongoDB 3.6+) let applications subscribe to real-time data changes on a collection, database, or deployment — without tailing the oplog. Available on WiredTiger replica sets and sharded clusters. Each event includes operationType, documentKey, and optional pre/post images (MongoDB 6.0+). Streams are resumable via a resumeToken.

javascript
// Watch only inserts and updates for pending orders
const pipeline = [
  { $match: {
    operationType: { $in: ["insert", "update"] },
    "fullDocument.status": "pending"
  }}
]

const changeStream = db.collection("orders").watch(pipeline, {
  fullDocument: "updateLookup"
})

for await (const change of changeStream) {
  const { operationType, fullDocument } = change

  if (operationType === "insert")
    await notifyWarehouse(fullDocument)
  if (operationType === "update")
    await sendStatusEmail(fullDocument)

  // Save token to resume after restart
  saveResumeToken(change._id)
}

// Resume from last known position
const stream = db.collection("orders").watch([], {
  resumeAfter: savedResumeToken
})
⚖️

Read & Write Concerns

Write concern controls how many replica set members must acknowledge a write. w: 'majority' ensures durability. Read concern controls consistency — 'local' (default, fastest), 'majority' (committed to majority), 'snapshot' (point-in-time, required for multi-shard transactions).

javascript
// Strong write — majority + journal flush
db.payments.insertOne(doc, {
  writeConcern: { w: "majority", j: true, wtimeout: 5000 }
})

// Snapshot read concern — used in transactions
const session = client.startSession()
session.startTransaction({
  readConcern:  { level: "snapshot" },
  writeConcern: { w: "majority" }
})

// Route reads to nearest replica
db.collection("analytics").find(
  { date: today },
  { readPreference: "nearest" }
)

Performance Best Practices

Key practices from the official MongoDB performance documentation.

  • Follow the ESR guideline for compound indexes: Equality → Sort → Range fields
  • Use covered queries — all queried and projected fields must be in the index so MongoDB never reads documents
  • Place $match (with indexed fields) as the very first aggregation stage to leverage indexes
  • Use projection to return only needed fields — reduces document size in transit
  • Prefer embedding over referencing for data always read together — avoids $lookup overhead
  • Keep working set (indexes + hot data) in RAM — monitor with db.stats() and db.serverStatus()
  • Use connection pooling in all drivers — never create a new MongoClient per request
  • Monitor slow queries with db.setProfilingLevel(1, { slowms: 100 }) and inspect system.profile
  • Avoid unbounded array growth — use separate collections or $slice projections
  • Avoid $where and JavaScript queries — they cannot use indexes and run in a JS interpreter
  • Minimise index count on write-heavy collections — each write must update all indexes
  • Use bulkWrite() to batch inserts/updates and reduce network round trips

Level 5 — Aggregation Pipeline

The aggregation pipeline passes documents through a sequence of stages — each stage transforms the output of the previous. Pipelines run with db.collection.aggregate() and do not modify source documents unless using $merge or $out. Put $match first to leverage indexes and reduce data early.

🔍
$match

Filters documents — identical syntax to find(). Place as early as possible to use indexes. MongoDB can use an index during $match if it is the first stage in the pipeline.

{ $match: { status: "completed", createdAt: { $gte: ISODate("2024-01-01") } } }
📦
$group

Groups by an _id expression and computes per-group aggregations. Accumulators: $sum, $avg, $min, $max, $first, $last, $push, $addToSet, $count.

{ $group: {
  _id: "$status",
  count:      { $sum: 1 },
  totalValue: { $sum: "$total" },
  avgValue:   { $avg: "$total" },
  firstOrder: { $first: "$createdAt" }
}}
🎯
$project

Reshapes documents — include (1), exclude (0), rename, or compute new fields. Place late in the pipeline to shape only the final output.

{ $project: {
  _id: 0,
  orderId:   "$_id",
  fullName: { $concat: ["$firstName", " ", "$lastName"] },
  discounted: { $multiply: ["$total", 0.9] },
  year: { $year: "$createdAt" }
}}
🔗
$lookup

Left outer join from another collection. Joined documents become an array field. Supports both simple equality joins and complex pipeline-based joins (MongoDB 3.6+).

{ $lookup: {
  from: "users",
  localField: "customerId",
  foreignField: "_id",
  as: "customer"
}}
📂
$unwind

Deconstructs an array field — outputs one document per element. Use preserveNullAndEmptyArrays to keep docs with missing/empty arrays.

{ $unwind: {
  path: "$items",
  preserveNullAndEmptyArrays: true,
  includeArrayIndex: "itemIndex"
}}
$addFields / $set

Adds or overwrites fields without removing others. $set is an alias. Useful for computed fields mid-pipeline.

{ $addFields: {
  totalWithTax: { $multiply: ["$total", 1.18] },
  isHighValue:  { $gt: ["$total", 5000] },
  month: { $month: "$createdAt" }
}}
🪟
$facet

Runs multiple independent sub-pipelines on the same input documents. Perfect for dashboards needing counts, top results, and charts from a single query.

{ $facet: {
  statusSummary: [
    { $group: { _id: "$status", count: { $sum: 1 } } }
  ],
  topProducts: [
    { $unwind: "$items" },
    { $group: { _id: "$items.sku", sold: { $sum: "$items.qty" } } },
    { $sort: { sold: -1 } }, { $limit: 5 }
  ]
}}
💾
$out / $merge

$out writes results to a new collection (replaces entirely). $merge (MongoDB 4.2+) upserts into an existing collection. Both must be the last stage.

// Write to new collection
{ $out: "monthly_revenue_report" }

// Merge / upsert into existing
{ $merge: {
  into: "daily_stats",
  on: "_id",
  whenMatched: "merge",
  whenNotMatched: "insert"
}}
javascript
// E-commerce revenue dashboard pipeline
db.orders.aggregate([

  // Stage 1: completed orders this year
  { $match: {
    status: "completed",
    createdAt: { $gte: new Date("2024-01-01") }
  }},

  // Stage 2: join users collection
  { $lookup: {
    from: "users",
    localField: "customerId",
    foreignField: "_id",
    as: "customer"
  }}},

  // Stage 3: flatten customer array
  { $unwind: "$customer" },

  // Stage 4: compute tax + extract month
  { $addFields: {
    totalWithTax: { $multiply: ["$total", 1.18] },
    month: { $dateToString: { format: "%Y-%m", date: "$createdAt" } }
  }},

  // Stage 5: group by month
  { $group: {
    _id: "$month",
    revenue:    { $sum: "$totalWithTax" },
    orderCount: { $sum: 1 },
    customers:  { $addToSet: "$customerId" }
  }},

  // Stage 6: shape + sort output
  { $project: {
    _id: 0, month: "$_id",
    revenue: { $round: ["$revenue", 2] },
    orderCount: 1,
    uniqueCustomers: { $size: "$customers" }
  }},

  { $sort: { month: 1 } }
])

Level 3 — Querying & Operators

MongoDB's query language supports comparison, logical, array, element, and evaluation operators. Queries use the same filter syntax across find(), update(), and delete().

Comparison Operators

$eq Matches values equal to a specified value (implicit for most queries)
$ne Matches values not equal to specified value
$gt Greater than — numeric, date, string (lexicographic)
$gte Greater than or equal to
$lt Less than
$lte Less than or equal to
$in Matches any value in the provided array
$nin Matches none of the values in the array

Logical Operators

$and Join query clauses — all must be true (default for multiple fields)
$or At least one clause must match
$nor None of the clauses match
$not Inverts a single operator expression

Array Operators

$all Array must contain all specified values
$elemMatch At least one array element matches all criteria
$size Array has exact number of elements

Element & Evaluation Operators

$exists Field exists (true) or does not exist (false)
$type Field matches a BSON type (e.g. 'string', 'int', 'date')
$regex String matches a regular expression pattern
$expr Use aggregation expressions inside a query filter
$mod Field value modulo divisor equals remainder
javascript
// Comparison: orders between £500–£5000
db.orders.find({
  total: { $gte: 500, $lte: 5000 },
  status: { $in: ["pending", "processing"] }
})

// Logical OR — high-value or urgent
db.orders.find({
  $or: [
    { total: { $gt: 10000 } },
    { priority: "urgent" }
  ]
})

// elemMatch: item with qty>1 AND price>500
db.orders.find({
  items: { $elemMatch: { qty: { $gt: 1 }, price: { $gt: 500 } } }
})

// $expr: find orders where spent > budget
db.projects.find({ $expr: { $gt: ["$spent", "$budget"] } })

// $exists + $regex combo
db.products.find({
  discount: { $exists: true },
  name: { $regex: /headphone/i }
})

// Dot notation — nested + array index
db.orders.find({ "items.0.sku": "ABC" })
db.users.find({ "address.city": "Bangalore" })

Level 1 — Setup & Core Concepts

Get MongoDB running locally and understand its fundamental building blocks before writing a single query.

Install MongoDB Community Edition

Download from mongodb.com or use a package manager. On macOS: brew tap mongodb/brew && brew install [email protected]. On Ubuntu: sudo apt-get install -y mongodb-org. Start the service: brew services start [email protected] (macOS) or sudo systemctl start mongod (Linux).

Connect with mongosh

MongoDB Shell (mongosh) is the official CLI. Run mongosh to connect to a local instance at mongodb://127.0.0.1:27017. For Atlas: mongosh "mongodb+srv://cluster0.example.mongodb.net/" --apiVersion 1 --username admin.

Core Concepts: Database → Collection → Document

A document is a BSON record — like a JSON object but binary-encoded with richer types (ObjectId, Date, Decimal128, etc.). A collection is a group of documents — schema-less by default. A database is a namespace holding collections. MongoDB auto-creates a collection on first insert.

Essential Shell Commands

Switch database: use mydb. List databases: show dbs (only shows DBs with data). List collections: show collections. Current DB: db. Drop a collection: db.users.drop(). Drop a database: db.dropDatabase().

ObjectId — MongoDB's Primary Key

Every document auto-gets a _id of type ObjectId unless you provide one. An ObjectId is 12 bytes: 4-byte timestamp + 5-byte random + 3-byte incrementing counter. It encodes creation time, so ObjectId.getTimestamp() returns the date it was created.

javascript
// Connect and explore
use ecommerce

// Insert creates the collection automatically
db.products.insertOne({
  name: "Wireless Headphones",
  price: 2999,
  stock: 120,
  tags: ["electronics", "audio"],
  specs: { battery: "30h", bluetooth: "5.3" },
  createdAt: new Date()
})

// ObjectId encodes creation time
const id = ObjectId()
console.log(id.getTimestamp()) // ISODate

// Show all collections
show collections
// Output: products

Level 4 — Indexing

Without indexes, MongoDB performs a full collection scan (COLLSCAN) on every query. Indexes store a sorted subset of fields for fast lookups (IXSCAN). Follow the ESR guideline for compound indexes: Equality fields first, then Sort fields, then Range fields.

1️⃣
Single Field Index

Index on one field. The _id field is always indexed automatically. Supports ascending (1) or descending (-1) order.

db.users.createIndex({ email: 1 })
// with unique constraint
db.users.createIndex({ email: 1 }, { unique: true })
🔗
Compound Index (ESR Rule)

Follow ESR: Equality fields first, Sort fields second, Range fields last. Supports prefix queries. Up to 32 fields.

// ESR: role (equality) → createdAt (sort) → age (range)
db.users.createIndex({ role: 1, createdAt: -1, age: 1 })
📋
Multikey Index

Auto-created when indexing an array field. MongoDB indexes each element separately. At most one multikey field per compound index.

// tags is an array — automatically becomes multikey
db.products.createIndex({ tags: 1 })
🔤
Text Index

Enables $text full-text search. One per collection. Supports language-specific stemming and weighted fields.

db.articles.createIndex(
  { title: "text", body: "text" },
  { weights: { title: 10, body: 1 } }
)
db.articles.find({ $text: { $search: "mongodb indexing" } })
TTL Index

Auto-deletes documents after set seconds. Only on single Date fields. Background thread runs every 60s — deletion is not immediate.

db.sessions.createIndex(
  { createdAt: 1 },
  { expireAfterSeconds: 3600 }
)
🔍
Partial Index

Indexes only documents matching a partialFilterExpression. Smaller and faster than a full index. Query must include the filter condition.

db.users.createIndex(
  { email: 1 },
  { partialFilterExpression: { active: { $eq: true } } }
)
🔒
Unique Index

Enforces uniqueness across the collection. Null values are also checked. Combine with partial filter to allow multiple nulls.

// Unique compound — username unique per org
db.users.createIndex(
  { orgId: 1, username: 1 },
  { unique: true }
)
🗺️
Geospatial Index (2dsphere)

Supports $near, $geoWithin, $geoIntersects on GeoJSON. Store as { type: 'Point', coordinates: [lng, lat] }.

db.places.createIndex({ location: "2dsphere" })
db.places.find({
  location: {
    $near: {
      $geometry: { type: "Point", coordinates: [77.59, 12.97] },
      $maxDistance: 1000
    }
  }
})
#️⃣
Hashed Index

Indexes the hash of a field's value. Used for hash-based sharding for even data distribution. Supports equality queries only — not range.

db.users.createIndex({ userId: "hashed" })

Diagnose with explain(): Run db.collection.explain('executionStats').find({...}) and check winningPlan.stage. IXSCAN = index used ✅. COLLSCAN = full scan ❌. Check totalDocsExamined vs nReturned — the closer they are, the more selective your index.

MongoDB vs SQL — Concept Mapping

Coming from a relational background? Here's how SQL concepts map to MongoDB.

Concept SQL Equivalent MongoDB Term Notes
Storage unit Row Document BSON format, max 16 MB
Group of records Table Collection Schema-less by default
Namespace Database Database Same concept
Primary key id (auto-increment) _id (ObjectId) Auto-generated, globally unique
Join JOIN $lookup Used in aggregation pipeline
Filter rows WHERE $match / find() Query operators like $gt, $in
Group & aggregate GROUP BY $group Part of aggregation pipeline
Index INDEX createIndex() Supports compound, text, TTL, geo

Written for developers who want to master MongoDB from the ground up.

MongoDB NoSQL database aggregation indexing backend