AI Gateways in 2026: Cloudflare vs Portkey vs LiteLLM vs Custom

An AI gateway sits between your application and LLM providers to handle routing, fallback, caching, rate limiting, and cost tracking. This guide compares Cloudflare AI Gateway, LiteLLM, Portkey, and building your own.

Mahmudul Haque Qudrati

CEO & ML Engineer

May 17, 2026

9 min read

// tags

#ai-gateway#cloudflare#litellm#portkey#rate-limiting

FIG. ART-32

9 min read

“

AI Gateways in 2026: Cloudflare vs Portkey vs LiteLLM vs Custom

// reading plan

sections

762

words

min read

// Developer Tools

GitHub Actions for Application Developers: A Practical Guide

You do not need a DevOps background to get real value from GitHub Actions. Here is everything an application developer needs to know.

10 min read

// Developer Tools

Advanced Git for Developers: The Operations You Avoid But Shouldn't

What an AI Gateway Does

An AI gateway is a proxy layer that sits between your application and one or more LLM providers. It handles:

Routing — send requests to different models based on rules (cost, capability, latency)
Fallback — if OpenAI is down, automatically retry on Anthropic
Caching — return cached responses for identical or semantically similar requests
Rate limiting — enforce per-user or per-team token budgets
Cost tracking — log spend per model, per user, per endpoint
Observability — unified logging across providers

Without a gateway, you write all of this logic in your application code, duplicated across every service that calls an LLM.

Architecture Comparison

| | Cloudflare AI Gateway | LiteLLM | Portkey | Custom (Hono/Next.js) | |---|---|---|---|---| | Deployment | Managed edge | Self-hosted or cloud | SaaS | Self-hosted | | Providers supported | Major APIs | 100+ | 100+ | You build it | | Semantic caching | No | No | Yes | You build it | | Fallback routing | Basic | Yes | Yes | You build it | | Analytics | Yes | Basic | Advanced | You build it | | Pricing | Free tier | Open source / $20+ | Free tier / $49+ | Infrastructure cost only | | Setup time | 5 minutes | 30 minutes | 10 minutes | Days |

Cloudflare AI Gateway

Cloudflare AI Gateway is free, runs at Cloudflare's edge, and requires no infrastructure to manage. It works by changing the base URL in your existing OpenAI SDK calls:

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
    # Add your Cloudflare gateway URL as base_url
    base_url="https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_name}/openai",
)

# All requests now flow through Cloudflare gateway
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

Features: real-time analytics dashboard, request/response logging, rate limiting by IP or custom header, edge caching for identical requests. The free tier covers most individual and small team use cases.

Limitation: no semantic caching (must be exact match), limited provider fallback configuration.

LiteLLM

LiteLLM is the open-source choice for teams that need 100+ provider support and want to run everything on their own infrastructure:

import litellm

# Same code works for any provider
response = litellm.completion(
    model="openai/gpt-4o",          # OpenAI
    # model="anthropic/claude-3-5-sonnet-20241022",   # Anthropic
    # model="together_ai/deepseek-ai/DeepSeek-R1",    # Together
    # model="ollama/llama3.3",                         # Local Ollama
    messages=[{"role": "user", "content": "Hello"}],
)

LiteLLM also ships a proxy server that exposes an OpenAI-compatible API in front of all providers:

pip install litellm[proxy]

litellm --model gpt-4o --fallbacks '[{"model": "claude-3-5-sonnet-20241022"}]' --port 4000

Your application calls localhost:4000 (OpenAI format), and LiteLLM handles provider routing, fallback, and retry logic.

Portkey

Portkey is the enterprise-grade option. Key features that differentiate it:

Semantic caching — cache responses to semantically similar (not just identical) requests, using embeddings to match queries. Can reduce LLM spend by 20-40% on repetitive workloads.
Virtual keys — team members use Portkey virtual keys instead of raw provider API keys, enabling centralized key rotation without updating every service
Guardrails — content filtering, PII detection, and output validation rules that run on every request/response
Per-request routing — route different users to different model tiers based on custom attributes

from portkey_ai import Portkey

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    virtual_key=os.environ["OPENAI_VIRTUAL_KEY"],
    config={
        "strategy": {"mode": "fallback"},
        "targets": [
            {"virtual_key": os.environ["OPENAI_VIRTUAL_KEY"], "model": "gpt-4o"},
            {"virtual_key": os.environ["ANTHROPIC_VIRTUAL_KEY"], "model": "claude-3-5-sonnet-20241022"},
        ],
        "cache": {"mode": "semantic", "max_age": 3600},
    },
)

Custom Gateway with Hono

For teams that want full control without managing LiteLLM or paying Portkey fees, a custom gateway built on Hono (Cloudflare Workers) is surprisingly lightweight:

import { Hono } from "hono";
import OpenAI from "openai";

const app = new Hono();

app.post("/v1/chat/completions", async (c) => {
  const body = await c.req.json();
  const { model, messages, ...rest } = body;

  // Custom routing logic
  const provider = model.startsWith("claude") ? "anthropic" : "openai";
  const apiKey = provider === "anthropic"
    ? c.env.ANTHROPIC_KEY
    : c.env.OPENAI_KEY;

  const client = new OpenAI({
    apiKey,
    baseURL: provider === "anthropic" ? "https://api.anthropic.com/v1" : undefined,
  });

  // Log to your analytics system
  await c.env.ANALYTICS.writeDataPoint({ indexes: [model], blobs: [JSON.stringify(messages)] });

  return client.chat.completions.create({ model, messages, ...rest });
});

export default app;

This runs on Cloudflare Workers (zero cold starts, global edge) and gives you complete control over routing logic. The trade-off is maintenance burden.

Which to Choose

Cloudflare AI Gateway — starting out, need basic analytics and caching, want zero infrastructure
LiteLLM — open source requirement, 100+ provider support, self-hosted everything
Portkey — enterprise, need semantic caching and guardrails, willing to pay SaaS fees
Custom — highest control, existing Cloudflare/Hono expertise, long-term cost optimization at scale

AI Gateways in 2026: Cloudflare vs Portkey vs LiteLLM vs Custom

Related Articles

GitHub Actions for Application Developers: A Practical Guide

Advanced Git for Developers: The Operations You Avoid But Shouldn't

What an AI Gateway Does

Architecture Comparison

Cloudflare AI Gateway

LiteLLM

Portkey

Custom Gateway with Hono

Which to Choose

Links

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

pnpm vs npm vs Yarn: The Definitive Package Manager Comparison for 2026

AI Gateways in 2026: Cloudflare vs Portkey vs LiteLLM vs Custom

Related Articles

GitHub Actions for Application Developers: A Practical Guide

Advanced Git for Developers: The Operations You Avoid But Shouldn't

What an AI Gateway Does

Architecture Comparison

Cloudflare AI Gateway

LiteLLM

Portkey

Custom Gateway with Hono

Which to Choose

Links

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

pnpm vs npm vs Yarn: The Definitive Package Manager Comparison for 2026

The workspace your team
actually needs