Rate limiting is a non-negotiable requirement for any public-facing API. Without it, a single bad actor or runaway client can exhaust your server resources, drive up your database costs, or trigger overage charges on third-party APIs you call on their behalf. The implementation is straightforward but the details matter: which algorithm you choose, which headers you return, how you differentiate limits by route, and how you handle the limit gracefully on the client side.
Why You Need Rate Limiting
Prevent abuse. A malicious actor attempting to brute-force login credentials will try thousands of combinations per minute if unconstrained. A rate limit on your /api/auth/login route stops this attack without requiring any other countermeasure.
Protect against DDoS. Distributed denial-of-service attacks work by overwhelming your server with requests. Rate limiting per IP address limits how much damage any single source can do, giving you time to respond.
Control costs. If your API calls an LLM (OpenAI, Anthropic) or another metered service for each request, an unrestrained client generates unbounded costs. Rate limiting per user ensures costs are proportional to legitimate use.
Enforce fair use. For multi-tenant SaaS products, rate limiting ensures one tenant's usage doesn't degrade service for others.
The Three Rate Limiting Algorithms
Fixed window. Count requests in a fixed time window (e.g., 100 requests per minute). Simple to implement. The weakness: a client can send 100 requests at 00:59 and 100 more at 01:00, effectively 200 requests in 2 seconds. Burst-prone at window boundaries.
Sliding window. Track requests over a rolling time period rather than a fixed window. If the limit is 100 requests per minute and the current timestamp is 01:00:30, count requests since 00:59:30. Smoother than fixed window, eliminates the burst problem. More expensive to implement (requires storing timestamps per request, or approximating with a combination of two fixed windows).
Token bucket. Each client has a bucket that refills at a constant rate (e.g., 10 tokens per second, max capacity 100 tokens). Each request costs 1 token. If the bucket is empty, the request is rejected. Allows controlled bursting (up to bucket capacity) while enforcing an average rate. Most flexible, slightly more complex to implement correctly.
For most web APIs: use sliding window for login and sensitive endpoints, token bucket for general API endpoints where you want to allow brief bursts.
Implementation in Next.js with Upstash Redis
Upstash Redis is serverless-compatible (HTTP-based, works in Next.js Edge Runtime and serverless functions) and has a free tier. The @upstash/ratelimit library implements sliding window and token bucket on top of Upstash Redis.
Install:
pnpm add @upstash/ratelimit @upstash/redis
Configure Upstash:
Create a Redis database at upstash.com and add to your .env.local:
UPSTASH_REDIS_REST_URL=https://your-db.upstash.io
UPSTASH_REDIS_REST_TOKEN=your-token
Create a rate limiter utility:
// lib/rate-limit.ts
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";
const redis = new Redis({
url: process.env.UPSTASH_REDIS_REST_URL!,
token: process.env.UPSTASH_REDIS_REST_TOKEN!,
});
// Strict limit for auth routes: 10 requests per 15 minutes
export const authRateLimit = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(10, "15 m"),
analytics: true,
});
// General API limit: 100 requests per minute, allows bursting
export const apiRateLimit = new Ratelimit({
redis,
limiter: Ratelimit.tokenBucket(100, "1 m", 20),
analytics: true,
});
Apply to a Next.js API route:
// app/api/auth/login/route.ts
import { NextRequest, NextResponse } from "next/server";
import { authRateLimit } from "@/lib/rate-limit";
export async function POST(request: NextRequest) {
// Get IP address (works behind Vercel/Cloudflare proxy)
const ip =
request.headers.get("x-forwarded-for")?.split(",")[0].trim() ??
request.headers.get("x-real-ip") ??
"anonymous";
const { success, limit, reset, remaining } = await authRateLimit.limit(ip);
if (!success) {
return NextResponse.json(
{ error: "Too many requests. Please try again later." },
{
status: 429,
headers: {
"X-RateLimit-Limit": limit.toString(),
"X-RateLimit-Remaining": "0",
"X-RateLimit-Reset": reset.toString(),
"Retry-After": Math.ceil((reset - Date.now()) / 1000).toString(),
},
}
);
}
// ... login logic
return NextResponse.json(
{ token: "..." },
{
headers: {
"X-RateLimit-Limit": limit.toString(),
"X-RateLimit-Remaining": remaining.toString(),
"X-RateLimit-Reset": reset.toString(),
},
}
);
}
Rate Limit Headers
Always return rate limit information in response headers. This lets well-behaved clients back off before hitting limits.
| Header | Value | Description |
|--------|-------|-------------|
| X-RateLimit-Limit | 100 | Maximum requests allowed in the window |
| X-RateLimit-Remaining | 87 | Requests remaining in current window |
| X-RateLimit-Reset | 1716145200000 | Unix timestamp (ms) when limit resets |
| Retry-After | 43 | Seconds until limit resets (only on 429) |
The Retry-After header is the most important: it tells clients exactly when to retry, preventing them from hammering your API with retry loops.
Different Limits for Different Routes
Not all routes need the same limit. A practical tiered approach:
// lib/rate-limit.ts
// Login: 10 attempts per 15 minutes (strict — prevents brute force)
export const loginRateLimit = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(10, "15 m"),
});
// Password reset: 3 requests per hour (prevent email bombing)
export const passwordResetRateLimit = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(3, "1 h"),
});
// API endpoints: 200 requests per minute per user
export const apiRateLimit = new Ratelimit({
redis,
limiter: Ratelimit.tokenBucket(200, "1 m", 50),
});
// AI endpoints (expensive): 20 requests per hour per user
export const aiRateLimit = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(20, "1 h"),
});
Apply the appropriate limiter by route type rather than using one global limit.
Limiting by User ID vs. IP Address
For unauthenticated endpoints (login, signup, password reset), limit by IP address — it's the only identifier available.
For authenticated endpoints, limit by user ID instead of IP address. This prevents the problem where multiple users behind a NAT (a corporate network, for example) share the same IP and collectively hit a limit that was meant for a single bad actor.
// For authenticated routes
const userId = user.id; // from getCurrentUser()
const { success, remaining } = await apiRateLimit.limit(userId);
Handling Rate Limit Errors on the Client
On the client side, handle 429 responses by reading the Retry-After header and waiting before retrying:
async function fetchWithRetry(url: string, options?: RequestInit): Promise<Response> {
const response = await fetch(url, options);
if (response.status === 429) {
const retryAfter = parseInt(response.headers.get("Retry-After") ?? "60");
console.warn(`Rate limited. Retrying after ${retryAfter}s`);
await new Promise((resolve) => setTimeout(resolve, retryAfter * 1000));
return fetchWithRetry(url, options);
}
return response;
}
For user-facing UI, surface the rate limit clearly. Instead of a generic error, show: "You've made too many login attempts. Please try again in 5 minutes." Include a countdown timer if the UX warrants it.
Testing Your Rate Limiting
Verify your limits work before relying on them. With k6:
import http from "k6/http";
import { check } from "k6";
export const options = {
vus: 1,
iterations: 15,
};
export default function () {
const res = http.post("https://api.myapp.com/auth/login", JSON.stringify({
email: "test@example.com",
password: "wrong-password",
}), { headers: { "Content-Type": "application/json" } });
console.log(`Status: ${res.status}, Remaining: ${res.headers["X-RateLimit-Remaining"]}`);
check(res, {
"status is 401 or 429": (r) => r.status === 401 || r.status === 429,
});
}
Run this with k6 run rate-limit-test.js and verify that the 11th request (beyond your 10/15m limit) returns 429 with the correct headers.
Keep Reading
- Performance Testing Guide for Developers — Measure how your API performs under load
- Secrets Management Guide for Developers — Protect your API beyond rate limiting
- Docker for Developers Guide — Running Redis locally for rate limit testing
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.