The Problem With Multi-Provider LLM Code
If you write code against the OpenAI SDK and later want to switch to Claude or Gemini, you rewrite your API calls. If you want fallback logic (try GPT-4o, fall back to Claude if it's down), you implement it yourself. If you want to track spend across providers, you build that too.
LiteLLM solves all three problems with a single library and an optional proxy server.
The completion() Function
from litellm import completion
# OpenAI
response = completion(model="gpt-4o", messages=[{"role": "user", "content": "Hello"}])
# Anthropic — same function, same response format
response = completion(model="claude-3-5-sonnet-20241022", messages=[{"role": "user", "content": "Hello"}])
# Bedrock Claude
response = completion(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0", messages=[{"role": "user", "content": "Hello"}])
# Gemini
response = completion(model="gemini/gemini-1.5-pro", messages=[{"role": "user", "content": "Hello"}])
Every response comes back in OpenAI format regardless of provider.
Fallback Logic
from litellm import completion
response = completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
fallbacks=["claude-3-5-sonnet-20241022", "gemini/gemini-1.5-pro"],
num_retries=2,
)
If GPT-4o returns an error or rate limit, LiteLLM automatically tries Claude, then Gemini.
The Proxy Server
LiteLLM's proxy turns any model into an OpenAI-compatible endpoint. Any tool that accepts an OpenAI API URL can point to your LiteLLM proxy:
litellm --model claude-3-5-sonnet-20241022 --port 8000
Then use the standard OpenAI SDK pointed at localhost:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000", api_key="anything")
response = client.chat.completions.create(
model="claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": "Hello"}],
)
Cost Tracking and Virtual Keys
The proxy includes built-in cost tracking per request, per key, and per team. Virtual API keys let you issue per-team keys with spend limits:
# config.yaml for litellm proxy
model_list:
- model_name: gpt-4o
litellm_params:
model: gpt-4o
api_key: os.environ/OPENAI_API_KEY
- model_name: claude-sonnet
litellm_params:
model: claude-3-5-sonnet-20241022
api_key: os.environ/ANTHROPIC_API_KEY
litellm_settings:
success_callback: ["langfuse"]
budget_manager: True
Router for Load Balancing and A/B Testing
from litellm import Router
router = Router(
model_list=[
{"model_name": "fast", "litellm_params": {"model": "gpt-4o-mini"}},
{"model_name": "smart", "litellm_params": {"model": "gpt-4o"}},
],
routing_strategy="latency-based-routing",
)
response = router.completion(model="fast", messages=[...])