Retry Strategies
Retrying blindly can make things worse. How you retry matters as much as whether you retry.
Immediate Retry
Retry instantly on failure. Only appropriate for transient errors like a momentary network blip. Never retry immediately on 5xx errors — the server is already under stress, and hammering it makes recovery slower.
Fixed Delay
Wait a fixed interval between retries — e.g. retry after 1s, 1s, 1s. Simple to implement but all clients will retry at the same time after a server recovery, creating a thundering herd.
Exponential Backoff
Double the wait time on each retry — 1s, 2s, 4s, 8s. This spreads out retries as failures accumulate. The standard approach for production systems. AWS SDK, Axios retry plugins, and most HTTP clients support this natively.
Exponential Backoff + Jitter (recommended)
Add random jitter to exponential backoff — wait = random(0, min(cap, base * 2^attempt)). The jitter prevents all retrying clients from hitting the server at the exact same moment. This is what AWS recommends and what every high-traffic system should use.
// Exponential backoff with jitter — production ready async function fetchWithRetry(url, options, maxRetries = 4) { const BASE_DELAY = 500 // ms const CAP = 8000 // ms max delay for (let attempt = 0; attempt <= maxRetries; attempt++) { try { const res = await fetch(url, options) // Don't retry client errors (4xx) — they won't change if (res.status >= 400 && res.status < 500) return res if (res.ok) return res throw new Error(`Server error: ${res.status}`) } catch (err) { if (attempt === maxRetries) throw err // Full jitter: random between 0 and capped exponential const exp = Math.min(CAP, BASE_DELAY * 2 ** attempt) const delay = Math.random() * exp console.log(`Retry ${attempt + 1} in ${Math.round(delay)}ms`) await new Promise(r => setTimeout(r, delay)) } } } // Usage — always pass idempotency key for non-GET requests await fetchWithRetry('/payments', { method: 'POST', headers: { 'Content-Type': 'application/json', 'Idempotency-Key': crypto.randomUUID() // generated once, before first attempt }, body: JSON.stringify({ amount: 2999, currency: 'INR' }) })
Never retry on 4xx errors: 400 Bad Request, 401 Unauthorized, 422 Unprocessable Entity — these won't change no matter how many times you retry. Only retry on network errors, timeouts, and 5xx server errors.
Real-World Scenarios
Idempotency isn't a theoretical concern. These are the exact situations where it breaks production systems.
Payment APIs — The Double-Charge Problem
User clicks "Pay". The request reaches your server, the charge is processed, but the response times out before reaching the client. The client retries. You've now charged the user twice. This is the most expensive non-idempotency bug in production. Stripe, Razorpay, and every serious payment gateway solve this with idempotency keys.
Order Creation — Duplicate Orders
A mobile app on a flaky connection submits an order. The server creates it, but the 201 response drops. The app retries. Now there are two identical orders for the same user. Without deduplication logic, fulfilment picks up both and ships twice.
Email & Notification APIs
Your order confirmation service calls an email API. A transient error causes a retry at the infrastructure level (SQS, Kafka consumer, cron job). The user receives the same welcome email three times. This erodes trust faster than most bugs.
Inventory Updates — Race Conditions
Two retried requests to decrement stock arrive simultaneously. Both read stock: 1, both decrement, both write stock: 0. But one of those decrements never "happened" from the database's perspective — you've oversold. This is where idempotency intersects with concurrency control.
Handling the In-Flight Race Condition
The basic middleware above has a gap: two retries arriving at the exact same millisecond both find no cached result and both start processing. Here's the production-safe version.
async function idempotencyMiddleware(req, res, next) { const key = req.headers['idempotency-key'] if (!key) return next() const lockKey = `idem:lock:${key}` const resultKey = `idem:result:${key}` // Check for completed result first const cached = await client.get(resultKey) if (cached) { const { status, body } = JSON.parse(cached) return res.status(status).json(body) } // Atomically acquire lock — NX = only set if key doesn't exist const locked = await client.set(lockKey, '1', { NX: true, EX: 30 // 30s lock TTL }) if (!locked) { // Another request is processing — tell client to retry shortly return res .status(409) .json({ error: 'Request in progress, retry after 1s' }) } // Intercept response to persist result + release lock const originalJson = res.json.bind(res) res.json = async (body) => { await Promise.all([ client.setEx(resultKey, 86400, JSON.stringify({ status: res.statusCode, body })), client.del(lockKey) ]) return originalJson(body) } next() }
The lock is acquired with SET NX EX — a single atomic Redis command. If the lock already exists (another request is processing), the second request immediately gets a 409. The client should wait ~1 second and retry. Once the first request finishes and stores the result, the retry will hit the cached result path.
Always set a TTL on the lock. If your server crashes mid-request without releasing the lock, the TTL ensures it eventually expires and the client can retry successfully.
PUT vs POST — The Core Difference
This is where most developers get confused. The difference isn't just semantics — it has real consequences when networks fail.
// POST — creates a NEW order every call // Call twice → two orders in the database app.post('/orders', async (req, res) => { const order = await Order.create({ userId: req.body.userId, items: req.body.items, total: req.body.total, status: 'pending' }) res.status(201).json(order) }) // PUT — replaces order at a known ID // Call twice → same result, one order app.put('/orders/:id', async (req, res) => { const order = await Order.findByIdAndReplace( req.params.id, { userId: req.body.userId, items: req.body.items, total: req.body.total }, { new: true, upsert: true } ) res.json(order) })
The retry test: Ask yourself: if this request runs twice, does the user experience the same outcome? If yes — idempotent. If no (double charge, duplicate record, double email) — not idempotent, and you need to handle it explicitly.
POST creates something new. Every time you call it, it should create a new resource. The server decides the ID. POST to /orders creates a new order. Call it twice, you get two orders. That's intentional — and it's why POST is not idempotent.
PUT replaces a resource at a specific URL. The client specifies the exact location. PUT to /orders/abc123 with a full payload sets that order to exactly that state. Call it once, call it ten times — the order looks the same at the end.
HTTP Methods & Idempotency
The HTTP spec defines which methods are idempotent and which are not. This isn't a suggestion — it's a contract your API should honour.
| Method | Idempotent | Safe | Typical Use |
|---|---|---|---|
| GET | Yes | Yes | Read a resource |
| PUT | Yes | No | Replace a resource entirely |
| DELETE | Yes | No | Remove a resource |
| POST | No | No | Create a resource / trigger action |
| PATCH | No* | No | Partial update (depends on implementation) |
| HEAD | Yes | Yes | Same as GET but no body |
Safe means the request doesn't modify server state — GET and HEAD are safe. Idempotent means repeated requests produce the same state — PUT and DELETE are idempotent but not safe. These are separate properties.
PATCH is marked with * because it can be idempotent depending on your implementation. PATCH /user/1 { "name": "Alice" } is idempotent — calling it ten times always results in name being Alice. But PATCH /counter { "op": "increment" } is not — each call changes the value.
TL;DR
Idempotency cheatsheet: GET / HEAD / PUT / DELETE → idempotent by spec, honour it POST / PATCH → not idempotent, add idempotency keys Idempotency key → UUID generated by client, per action Server stores → key + response in Redis, 24h TTL Duplicate arrives → return cached response, skip execution In-flight race → Redis SET NX lock, return 409 if locked Retry strategy → exponential backoff + jitter Never retry → 4xx errors (client's fault) Always retry → network timeouts, 5xx with idempotency key Rule of thumb: if a retry can cause harm → add an idempotency key
Idempotency is a design contract, not an implementation detail. If you build payment, order, or notification APIs without it, you will hit production bugs that are hard to reproduce and expensive to fix.
Idempotency Keys
An idempotency key is a unique identifier the client generates and sends with a request. The server uses it to detect duplicate requests and return the cached result instead of re-executing.
// Express middleware — idempotency key handler const redis = require('redis') const client = redis.createClient() async function idempotencyMiddleware(req, res, next) { const key = req.headers['idempotency-key'] // No key — allow through (or reject depending on policy) if (!key) return next() const cached = await client.get(`idem:${key}`) if (cached) { // Duplicate request — return stored response const { status, body } = JSON.parse(cached) return res.status(status).json(body) } // Intercept res.json to cache the response const originalJson = res.json.bind(res) res.json = async (body) => { await client.setEx( `idem:${key}`, 86400, // 24 hour TTL JSON.stringify({ status: res.statusCode, body }) ) return originalJson(body) } next() } // Apply to payment routes only app.post('/payments', idempotencyMiddleware, async (req, res) => { const charge = await processPayment(req.body) res.status(201).json(charge) })
Watch out for the in-flight race condition: Two retries can arrive simultaneously before either one has stored the response. Use Redis SET NX (set if not exists) to atomically claim the key before processing, and return 409 Conflict if the key is already being processed.
The client generates a UUID before making the request and attaches it as a header — typically Idempotency-Key. If the request succeeds, the server stores the key and the response. If the same key arrives again (retry), the server returns the stored response without executing the operation again.
The key must be generated before the request, not after. It represents "this specific attempt to perform this action". A new user action gets a new key. A retry of the same action reuses the same key.
Keys should expire after a reasonable window — Stripe uses 24 hours. After expiry, the same key is treated as a new request. Store keys in Redis with a TTL for efficient lookup.