What is MongoDB?
MongoDB is a document-oriented NoSQL database that stores data as flexible JSON-like documents called BSON (Binary JSON). Unlike relational databases, there are no rigid schemas — each document in a collection can have a different structure.
It is designed for horizontal scalability, high availability, and developer productivity. You can model one-to-many and many-to-many relationships naturally, embed related data in a single document, or reference across collections — all depending on your access patterns.
Key insight: In MongoDB, a database contains collections, which contain documents. Think: database → table → row in SQL terms, but far more flexible.
This guide progresses through 6 levels: Setup, CRUD, Querying, Indexing, Aggregation, and Advanced topics like transactions and replication.
Level 2 — CRUD Operations
CRUD operations target a single collection at a time. All single-document writes are atomic. MongoDB auto-creates collections on first insert.
insertOne()
insertMany()
Insert one or multiple documents. MongoDB generates _id automatically if not provided. insertMany() is ordered by default — it stops on first error unless { ordered: false }.
// insertOne — returns insertedId db.orders.insertOne({ customerId: ObjectId("64f1a2b3c4d5e6f7a8b9c0d1"), items: [{ sku: "ABC", qty: 2, price: 999 }], status: "pending", total: 1998, createdAt: new Date() }) // insertMany — ordered:false continues on error db.products.insertMany([ { name: "Keyboard", price: 1499, stock: 50 }, { name: "Mouse", price: 799, stock: 200 }, { name: "Monitor", price: 12999, stock: 15 } ], { ordered: false })
find()
findOne()
countDocuments()
Query documents using a filter object. find() returns a cursor — iterate it or chain .toArray(). Use projection to return only needed fields. Dot notation queries nested fields.
// find with filter + projection db.orders.find( { status: "pending", total: { $gt: 1000 } }, { customerId: 1, total: 1, createdAt: 1, _id: 0 } ) // Dot notation — nested fields db.products.find({ "specs.bluetooth": "5.3" }) // Sort + limit + skip (pagination) db.products.find() .sort({ price: -1 }) .skip(0).limit(10) // Count matching documents db.orders.countDocuments({ status: "pending" })
updateOne()
updateMany()
findOneAndUpdate()
replaceOne()
Modify documents using update operators. $set adds/updates fields. $unset removes fields. $inc increments numbers. $push appends to arrays. Use upsert:true to insert if not found.
// $set — update specific fields only db.orders.updateOne( { _id: ObjectId("...") }, { $set: { status: "shipped", updatedAt: new Date() } } ) // $inc — atomic stock decrement db.products.updateOne( { name: "Monitor" }, { $inc: { stock: -1 } } ) // $push — append to array db.users.updateOne( { email: "[email protected]" }, { $push: { loginHistory: new Date() } } ) // upsert — insert if not found db.settings.updateOne( { userId: "u123" }, { $set: { theme: "dark", lang: "en" } }, { upsert: true } )
deleteOne()
deleteMany()
findOneAndDelete()
Remove documents matching a filter. deleteOne() removes the first match. deleteMany({}) empties a collection but keeps its indexes — unlike drop(). findOneAndDelete() returns the deleted document.
// deleteOne — remove first match db.sessions.deleteOne({ sessionId: "abc123" }) // deleteMany — purge expired sessions db.sessions.deleteMany({ expiresAt: { $lt: new Date() } }) // findOneAndDelete — retrieve doc before removing const cancelled = await db.orders.findOneAndDelete( { _id: ObjectId("...") } ) console.log(cancelled) // Drop entire collection (removes indexes too) db.tempLogs.drop()
Level 6 — Advanced Topics
Production-grade MongoDB: multi-document transactions, schema validation, replica sets, sharding, change streams, and performance tuning.
Multi-Document ACID Transactions
MongoDB supports distributed ACID transactions across multiple documents, collections, databases, and shards (4.0+ replica sets, 4.2+ sharded clusters). Transactions commit all changes or roll back entirely. Uncommitted changes are never visible outside the transaction. Default runtime limit is 1 minute via transactionLifetimeLimitSeconds.
// Atomic bank transfer — debit + credit + ledger const session = client.startSession() try { session.startTransaction({ readConcern: { level: "snapshot" }, writeConcern: { w: "majority" } }) // Debit sender await db.accounts.updateOne( { _id: senderId, balance: { $gte: amount } }, { $inc: { balance: -amount } }, { session } ) // Credit receiver await db.accounts.updateOne( { _id: receiverId }, { $inc: { balance: +amount } }, { session } ) // Append to ledger await db.ledger.insertOne( { from: senderId, to: receiverId, amount, ts: new Date() }, { session } ) await session.commitTransaction() } catch (err) { await session.abortTransaction() } finally { await session.endSession() }
Schema Validation with $jsonSchema
Enforce document structure at the database level. validationAction: 'error' rejects invalid writes. 'warn' logs without rejecting. validationLevel: 'strict' applies to all inserts and updates. 'moderate' applies only to inserts and updates of currently valid documents.
db.createCollection("orders", { validator: { $jsonSchema: { bsonType: "object", required: ["customerId", "items", "total", "status"], properties: { customerId: { bsonType: "objectId" }, items: { bsonType: "array", minItems: 1, items: { required: ["sku", "qty", "price"], properties: { sku: { bsonType: "string" }, qty: { bsonType: "int", minimum: 1 }, price: { bsonType: "double", minimum: 0 } } } }, status: { bsonType: "string", enum: ["pending", "processing", "shipped", "completed", "cancelled"] } } } }, validationAction: "error", validationLevel: "strict" })
Replica Sets & High Availability
A replica set is a group of mongod processes maintaining the same dataset. One primary accepts all writes; secondaries replicate the primary's oplog asynchronously. On primary failure, an eligible secondary holds an election automatically. All production deployments must use replica sets. Reads can be distributed to secondaries with read preferences.
// Initiate a 3-member replica set rs.initiate({ _id: "rs0", members: [ { _id: 0, host: "mongo1:27017", priority: 2 }, { _id: 1, host: "mongo2:27017", priority: 1 }, { _id: 2, host: "mongo3:27017", priority: 1 } ] }) // Check status rs.status() // Read from nearest secondary (Node.js driver) const client = new MongoClient(uri, { readPreference: "secondaryPreferred" }) // Write concern — majority acknowledgement + journal db.orders.insertOne(doc, { writeConcern: { w: "majority", j: true } })
Sharding — Horizontal Scaling
Sharding distributes data across multiple shards (each a replica set) using a shard key. The mongos router directs queries to the appropriate shards. Config servers store cluster metadata. Choose a shard key with high cardinality and even distribution to avoid hotspots. Once sharded, a collection cannot be unsharded.
// Enable sharding on a database sh.enableSharding("ecommerce") // Hashed shard key — even distribution sh.shardCollection( "ecommerce.orders", { customerId: "hashed" } ) // Range shard key — good for time-series range queries sh.shardCollection( "ecommerce.events", { createdAt: 1 } ) // Check sharding status sh.status() sh.getBalancerState()
Change Streams — Real-Time Events
Change streams (MongoDB 3.6+) let applications subscribe to real-time data changes on a collection, database, or deployment — without tailing the oplog. Available on WiredTiger replica sets and sharded clusters. Each event includes operationType, documentKey, and optional pre/post images (MongoDB 6.0+). Streams are resumable via a resumeToken.
// Watch only inserts and updates for pending orders const pipeline = [ { $match: { operationType: { $in: ["insert", "update"] }, "fullDocument.status": "pending" }} ] const changeStream = db.collection("orders").watch(pipeline, { fullDocument: "updateLookup" }) for await (const change of changeStream) { const { operationType, fullDocument } = change if (operationType === "insert") await notifyWarehouse(fullDocument) if (operationType === "update") await sendStatusEmail(fullDocument) // Save token to resume after restart saveResumeToken(change._id) } // Resume from last known position const stream = db.collection("orders").watch([], { resumeAfter: savedResumeToken })
Read & Write Concerns
Write concern controls how many replica set members must acknowledge a write. w: 'majority' ensures durability. Read concern controls consistency — 'local' (default, fastest), 'majority' (committed to majority), 'snapshot' (point-in-time, required for multi-shard transactions).
// Strong write — majority + journal flush db.payments.insertOne(doc, { writeConcern: { w: "majority", j: true, wtimeout: 5000 } }) // Snapshot read concern — used in transactions const session = client.startSession() session.startTransaction({ readConcern: { level: "snapshot" }, writeConcern: { w: "majority" } }) // Route reads to nearest replica db.collection("analytics").find( { date: today }, { readPreference: "nearest" } )
Performance Best Practices
Key practices from the official MongoDB performance documentation.
- Follow the ESR guideline for compound indexes: Equality → Sort → Range fields
- Use covered queries — all queried and projected fields must be in the index so MongoDB never reads documents
- Place $match (with indexed fields) as the very first aggregation stage to leverage indexes
- Use projection to return only needed fields — reduces document size in transit
- Prefer embedding over referencing for data always read together — avoids $lookup overhead
- Keep working set (indexes + hot data) in RAM — monitor with db.stats() and db.serverStatus()
- Use connection pooling in all drivers — never create a new MongoClient per request
- Monitor slow queries with db.setProfilingLevel(1, { slowms: 100 }) and inspect system.profile
- Avoid unbounded array growth — use separate collections or $slice projections
- Avoid $where and JavaScript queries — they cannot use indexes and run in a JS interpreter
- Minimise index count on write-heavy collections — each write must update all indexes
- Use bulkWrite() to batch inserts/updates and reduce network round trips
Level 5 — Aggregation Pipeline
The aggregation pipeline passes documents through a sequence of stages — each stage transforms the output of the previous. Pipelines run with db.collection.aggregate() and do not modify source documents unless using $merge or $out. Put $match first to leverage indexes and reduce data early.
$match
Filters documents — identical syntax to find(). Place as early as possible to use indexes. MongoDB can use an index during $match if it is the first stage in the pipeline.
{ $match: { status: "completed", createdAt: { $gte: ISODate("2024-01-01") } } }$group
Groups by an _id expression and computes per-group aggregations. Accumulators: $sum, $avg, $min, $max, $first, $last, $push, $addToSet, $count.
{ $group: {
_id: "$status",
count: { $sum: 1 },
totalValue: { $sum: "$total" },
avgValue: { $avg: "$total" },
firstOrder: { $first: "$createdAt" }
}}$project
Reshapes documents — include (1), exclude (0), rename, or compute new fields. Place late in the pipeline to shape only the final output.
{ $project: {
_id: 0,
orderId: "$_id",
fullName: { $concat: ["$firstName", " ", "$lastName"] },
discounted: { $multiply: ["$total", 0.9] },
year: { $year: "$createdAt" }
}}$lookup
Left outer join from another collection. Joined documents become an array field. Supports both simple equality joins and complex pipeline-based joins (MongoDB 3.6+).
{ $lookup: {
from: "users",
localField: "customerId",
foreignField: "_id",
as: "customer"
}}$unwind
Deconstructs an array field — outputs one document per element. Use preserveNullAndEmptyArrays to keep docs with missing/empty arrays.
{ $unwind: {
path: "$items",
preserveNullAndEmptyArrays: true,
includeArrayIndex: "itemIndex"
}}$addFields / $set
Adds or overwrites fields without removing others. $set is an alias. Useful for computed fields mid-pipeline.
{ $addFields: {
totalWithTax: { $multiply: ["$total", 1.18] },
isHighValue: { $gt: ["$total", 5000] },
month: { $month: "$createdAt" }
}}$facet
Runs multiple independent sub-pipelines on the same input documents. Perfect for dashboards needing counts, top results, and charts from a single query.
{ $facet: {
statusSummary: [
{ $group: { _id: "$status", count: { $sum: 1 } } }
],
topProducts: [
{ $unwind: "$items" },
{ $group: { _id: "$items.sku", sold: { $sum: "$items.qty" } } },
{ $sort: { sold: -1 } }, { $limit: 5 }
]
}}$out / $merge
$out writes results to a new collection (replaces entirely). $merge (MongoDB 4.2+) upserts into an existing collection. Both must be the last stage.
// Write to new collection { $out: "monthly_revenue_report" } // Merge / upsert into existing { $merge: { into: "daily_stats", on: "_id", whenMatched: "merge", whenNotMatched: "insert" }}
// E-commerce revenue dashboard pipeline db.orders.aggregate([ // Stage 1: completed orders this year { $match: { status: "completed", createdAt: { $gte: new Date("2024-01-01") } }}, // Stage 2: join users collection { $lookup: { from: "users", localField: "customerId", foreignField: "_id", as: "customer" }}}, // Stage 3: flatten customer array { $unwind: "$customer" }, // Stage 4: compute tax + extract month { $addFields: { totalWithTax: { $multiply: ["$total", 1.18] }, month: { $dateToString: { format: "%Y-%m", date: "$createdAt" } } }}, // Stage 5: group by month { $group: { _id: "$month", revenue: { $sum: "$totalWithTax" }, orderCount: { $sum: 1 }, customers: { $addToSet: "$customerId" } }}, // Stage 6: shape + sort output { $project: { _id: 0, month: "$_id", revenue: { $round: ["$revenue", 2] }, orderCount: 1, uniqueCustomers: { $size: "$customers" } }}, { $sort: { month: 1 } } ])
Level 3 — Querying & Operators
MongoDB's query language supports comparison, logical, array, element, and evaluation operators. Queries use the same filter syntax across find(), update(), and delete().
Comparison Operators
$eq
Matches values equal to a specified value (implicit for most queries)
$ne
Matches values not equal to specified value
$gt
Greater than — numeric, date, string (lexicographic)
$gte
Greater than or equal to
$lt
Less than
$lte
Less than or equal to
$in
Matches any value in the provided array
$nin
Matches none of the values in the array
Logical Operators
$and
Join query clauses — all must be true (default for multiple fields)
$or
At least one clause must match
$nor
None of the clauses match
$not
Inverts a single operator expression
Array Operators
$all
Array must contain all specified values
$elemMatch
At least one array element matches all criteria
$size
Array has exact number of elements
Element & Evaluation Operators
$exists
Field exists (true) or does not exist (false)
$type
Field matches a BSON type (e.g. 'string', 'int', 'date')
$regex
String matches a regular expression pattern
$expr
Use aggregation expressions inside a query filter
$mod
Field value modulo divisor equals remainder
// Comparison: orders between £500–£5000 db.orders.find({ total: { $gte: 500, $lte: 5000 }, status: { $in: ["pending", "processing"] } }) // Logical OR — high-value or urgent db.orders.find({ $or: [ { total: { $gt: 10000 } }, { priority: "urgent" } ] }) // elemMatch: item with qty>1 AND price>500 db.orders.find({ items: { $elemMatch: { qty: { $gt: 1 }, price: { $gt: 500 } } } }) // $expr: find orders where spent > budget db.projects.find({ $expr: { $gt: ["$spent", "$budget"] } }) // $exists + $regex combo db.products.find({ discount: { $exists: true }, name: { $regex: /headphone/i } }) // Dot notation — nested + array index db.orders.find({ "items.0.sku": "ABC" }) db.users.find({ "address.city": "Bangalore" })
Level 1 — Setup & Core Concepts
Get MongoDB running locally and understand its fundamental building blocks before writing a single query.
Install MongoDB Community Edition
Download from mongodb.com or use a package manager. On macOS: brew tap mongodb/brew && brew install [email protected]. On Ubuntu: sudo apt-get install -y mongodb-org. Start the service: brew services start [email protected] (macOS) or sudo systemctl start mongod (Linux).
Connect with mongosh
MongoDB Shell (mongosh) is the official CLI. Run mongosh to connect to a local instance at mongodb://127.0.0.1:27017. For Atlas: mongosh "mongodb+srv://cluster0.example.mongodb.net/" --apiVersion 1 --username admin.
Core Concepts: Database → Collection → Document
A document is a BSON record — like a JSON object but binary-encoded with richer types (ObjectId, Date, Decimal128, etc.). A collection is a group of documents — schema-less by default. A database is a namespace holding collections. MongoDB auto-creates a collection on first insert.
Essential Shell Commands
Switch database: use mydb. List databases: show dbs (only shows DBs with data). List collections: show collections. Current DB: db. Drop a collection: db.users.drop(). Drop a database: db.dropDatabase().
ObjectId — MongoDB's Primary Key
Every document auto-gets a _id of type ObjectId unless you provide one. An ObjectId is 12 bytes: 4-byte timestamp + 5-byte random + 3-byte incrementing counter. It encodes creation time, so ObjectId.getTimestamp() returns the date it was created.
// Connect and explore use ecommerce // Insert creates the collection automatically db.products.insertOne({ name: "Wireless Headphones", price: 2999, stock: 120, tags: ["electronics", "audio"], specs: { battery: "30h", bluetooth: "5.3" }, createdAt: new Date() }) // ObjectId encodes creation time const id = ObjectId() console.log(id.getTimestamp()) // ISODate // Show all collections show collections // Output: products
Level 4 — Indexing
Without indexes, MongoDB performs a full collection scan (COLLSCAN) on every query. Indexes store a sorted subset of fields for fast lookups (IXSCAN). Follow the ESR guideline for compound indexes: Equality fields first, then Sort fields, then Range fields.
Index on one field. The _id field is always indexed automatically. Supports ascending (1) or descending (-1) order.
db.users.createIndex({ email: 1 })
// with unique constraint
db.users.createIndex({ email: 1 }, { unique: true })
Follow ESR: Equality fields first, Sort fields second, Range fields last. Supports prefix queries. Up to 32 fields.
// ESR: role (equality) → createdAt (sort) → age (range)
db.users.createIndex({ role: 1, createdAt: -1, age: 1 })
Auto-created when indexing an array field. MongoDB indexes each element separately. At most one multikey field per compound index.
// tags is an array — automatically becomes multikey
db.products.createIndex({ tags: 1 })
Enables $text full-text search. One per collection. Supports language-specific stemming and weighted fields.
db.articles.createIndex(
{ title: "text", body: "text" },
{ weights: { title: 10, body: 1 } }
)
db.articles.find({ $text: { $search: "mongodb indexing" } })
Auto-deletes documents after set seconds. Only on single Date fields. Background thread runs every 60s — deletion is not immediate.
db.sessions.createIndex(
{ createdAt: 1 },
{ expireAfterSeconds: 3600 }
)
Indexes only documents matching a partialFilterExpression. Smaller and faster than a full index. Query must include the filter condition.
db.users.createIndex(
{ email: 1 },
{ partialFilterExpression: { active: { $eq: true } } }
)
Enforces uniqueness across the collection. Null values are also checked. Combine with partial filter to allow multiple nulls.
// Unique compound — username unique per org
db.users.createIndex(
{ orgId: 1, username: 1 },
{ unique: true }
)
Supports $near, $geoWithin, $geoIntersects on GeoJSON. Store as { type: 'Point', coordinates: [lng, lat] }.
db.places.createIndex({ location: "2dsphere" })
db.places.find({
location: {
$near: {
$geometry: { type: "Point", coordinates: [77.59, 12.97] },
$maxDistance: 1000
}
}
})
Indexes the hash of a field's value. Used for hash-based sharding for even data distribution. Supports equality queries only — not range.
db.users.createIndex({ userId: "hashed" })
Diagnose with explain(): Run db.collection.explain('executionStats').find({...}) and check winningPlan.stage. IXSCAN = index used ✅. COLLSCAN = full scan ❌. Check totalDocsExamined vs nReturned — the closer they are, the more selective your index.
MongoDB vs SQL — Concept Mapping
Coming from a relational background? Here's how SQL concepts map to MongoDB.
| Concept | SQL Equivalent | MongoDB Term | Notes |
|---|---|---|---|
| Storage unit | Row | Document | BSON format, max 16 MB |
| Group of records | Table | Collection | Schema-less by default |
| Namespace | Database | Database | Same concept |
| Primary key | id (auto-increment) | _id (ObjectId) | Auto-generated, globally unique |
| Join | JOIN | $lookup | Used in aggregation pipeline |
| Filter rows | WHERE | $match / find() | Query operators like $gt, $in |
| Group & aggregate | GROUP BY | $group | Part of aggregation pipeline |
| Index | INDEX | createIndex() | Supports compound, text, TTL, geo |