How to Implement API Rate Limiting: A Complete Guide for Application Developers
Rate limiting prevents abuse, protects against DDoS, and controls API costs. Here's how to implement it correctly in Next.js with the right algorithms and headers.
Rate limiting is a non-negotiable requirement for any public-facing API. Without it, a single bad actor or runaway client can exhaust your server resources, drive up your database costs, or trigger overage charges on third-party APIs you call on their behalf. The implementation is straightforward but the details matter: which algorithm you choose, which headers you return, how you differentiate limits by route, and how you handle the limit gracefully on the client side.
Why You Need Rate Limiting
Prevent abuse. A malicious actor attempting to brute-force login credentials will try thousands of combinations per minute if unconstrained. A rate limit on your /api/auth/login route stops this attack without requiring any other countermeasure.
Protect against DDoS. Distributed denial-of-service attacks work by overwhelming your server with requests. Rate limiting per IP address limits how much damage any single source can do, giving you time to respond.
Control costs. If your API calls an LLM (OpenAI, Anthropic) or another metered service for each request, an unrestrained client generates unbounded costs. Rate limiting per user ensures costs are proportional to legitimate use.
Enforce fair use. For multi-tenant SaaS products, rate limiting ensures one tenant's usage doesn't degrade service for others.
The Three Rate Limiting Algorithms
Fixed window. Count requests in a fixed time window (e.g., 100 requests per minute). Simple to implement. The weakness: a client can send 100 requests at 00:59 and 100 more at 01:00, effectively 200 requests in 2 seconds. Burst-prone at window boundaries.
Sliding window. Track requests over a rolling time period rather than a fixed window. If the limit is 100 requests per minute and the current timestamp is 01:00:30, count requests since 00:59:30. Smoother than fixed window, eliminates the burst problem. More expensive to implement (requires storing timestamps per request, or approximating with a combination of two fixed windows).
Token bucket. Each client has a bucket that refills at a constant rate (e.g., 10 tokens per second, max capacity 100 tokens). Each request costs 1 token. If the bucket is empty, the request is rejected. Allows controlled bursting (up to bucket capacity) while enforcing an average rate. Most flexible, slightly more complex to implement correctly.
For most web APIs: use sliding window for login and sensitive endpoints, token bucket for general API endpoints where you want to allow brief bursts.
Team workspace
Ship faster with chat, meetings, and projects in one place — Zlyqor.
Upstash Redis is serverless-compatible (HTTP-based, works in Next.js Edge Runtime and serverless functions) and has a free tier. The @upstash/ratelimit library implements sliding window and token bucket on top of Upstash Redis.
Install:
pnpm add @upstash/ratelimit @upstash/redis
Configure Upstash:
Create a Redis database at upstash.com and add to your .env.local:
Always return rate limit information in response headers. This lets well-behaved clients back off before hitting limits.
Header
Value
Description
X-RateLimit-Limit
100
Maximum requests allowed in the window
X-RateLimit-Remaining
87
Requests remaining in current window
X-RateLimit-Reset
1716145200000
Unix timestamp (ms) when limit resets
Retry-After
43
Seconds until limit resets (only on 429)
The Retry-After header is the most important: it tells clients exactly when to retry, preventing them from hammering your API with retry loops.
Different Limits for Different Routes
Not all routes need the same limit. A practical tiered approach:
// lib/rate-limit.ts
// Login: 10 attempts per 15 minutes (strict - prevents brute force)
export const loginRateLimit = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(10, "15 m"),
});
// Password reset: 3 requests per hour (prevent email bombing)
export const passwordResetRateLimit = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(3, "1 h"),
});
// API endpoints: 200 requests per minute per user
export const apiRateLimit = new Ratelimit({
redis,
limiter: Ratelimit.tokenBucket(200, "1 m", 50),
});
// AI endpoints (expensive): 20 requests per hour per user
export const aiRateLimit = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(20, "1 h"),
});
Apply the appropriate limiter by route type rather than using one global limit.
Limiting by User ID vs. IP Address
For unauthenticated endpoints (login, signup, password reset), limit by IP address - it's the only identifier available.
For authenticated endpoints, limit by user ID instead of IP address. This prevents the problem where multiple users behind a NAT (a corporate network, for example) share the same IP and collectively hit a limit that was meant for a single bad actor.
// For authenticated routes
const userId = user.id; // from getCurrentUser()
const { success, remaining } = await apiRateLimit.limit(userId);
Handling Rate Limit Errors on the Client
On the client side, handle 429 responses by reading the Retry-After header and waiting before retrying:
For user-facing UI, surface the rate limit clearly. Instead of a generic error, show: "You've made too many login attempts. Please try again in 5 minutes." Include a countdown timer if the UX warrants it.
Testing Your Rate Limiting
Verify your limits work before relying on them. With k6:
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace - chat, projects, time tracking, AI meeting summaries, and invoicing - in one tool. Try it free.
Practical deep-dives on LLMs, developer tools, and AI engineering. No filler. Unsubscribe any time.
// written byFIG. AUTH-01
530
Mahmudul Haque Qudrati
CEO & ML Engineer
CEO and ML Engineer at Pristren. Builds AI-powered software for teams and writes about machine learning, LLMs, developer tools, and practical AI applications.
Open Code Review – An AI-powered code review CLI tool: A Practical Overview
Open Code Review is an open-source CLI tool from Alibaba that uses AI to review code changes. It runs locally, supports multiple LLMs, and costs about $0.01 per review. Here's a practical breakdown.