Anthropic's Message Batches API provides a 50% discount on all Claude model pricing for requests that do not require an immediate response, with results delivered within 24 hours. It works the same way as OpenAI's Batch API: you submit a batch of requests, Anthropic processes them asynchronously, and you pay half price. Claude is particularly valuable for batch processing long documents and complex reasoning tasks where its instruction following and document comprehension capabilities genuinely pull ahead of cheaper alternatives.
When Claude Batch Processing Is Worth It
Claude's advantages over GPT-4o-mini are most pronounced on tasks requiring careful instruction following over long contexts and nuanced analysis. These are also the highest-value batch processing use cases.
Long document processing. Claude consistently performs better on tasks requiring careful reading of long documents (10,000-200,000 tokens). Legal document review, financial report summarization, academic paper analysis — these tasks see more quality improvement from using Claude over a cheaper model than short-text tasks do. The batch API makes Claude's superior long-context capability affordable at scale.
Complex reasoning with specific output formats. Claude models reliably follow complex output format specifications (nested JSON, structured reports with exact field requirements). For batch jobs where you need machine-parseable outputs, Claude's adherence to format instructions reduces post-processing failure rates.
Multi-step instruction chains. Tasks with 5+ instructions in a system prompt ("first extract X, then compare with Y, then classify as Z, then write a summary that...") are handled more reliably by Claude than smaller models. At batch API pricing, this capability is available at roughly the same cost per task as GPT-4o standard pricing.
How to Use the Anthropic Message Batches API
The API is available in the official Anthropic Python and TypeScript SDKs.
import anthropic
client = anthropic.Anthropic()
# Create a batch with multiple requests
batch = client.messages.batches.create(
requests=[
{
"custom_id": "doc-001",
"params": {
"model": "claude-3-5-haiku-20241022",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "Summarize the key financial metrics from this quarterly report: [document text]"
}
]
}
},
{
"custom_id": "doc-002",
"params": {
"model": "claude-3-5-haiku-20241022",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "Extract all named entities (people, organizations, locations) from: [document text]"
}
]
}
}
]
)
print(f"Batch created: {batch.id}")
print(f"Processing status: {batch.processing_status}")
Checking Status and Retrieving Results
import time
def wait_for_anthropic_batch(batch_id: str, poll_interval: int = 60):
while True:
batch = client.messages.batches.retrieve(batch_id)
print(f"Status: {batch.processing_status}")
print(f"Counts: {batch.request_counts}")
if batch.processing_status == "ended":
return batch
time.sleep(poll_interval)
completed = wait_for_anthropic_batch(batch.id)
# Stream results
for result in client.messages.batches.results(batch.id):
if result.result.type == "succeeded":
print(f"{result.custom_id}: {result.result.message.content[0].text}")
elif result.result.type == "errored":
print(f"{result.custom_id}: Error - {result.result.error}")
Pricing Calculation for Common Use Cases
Using Claude 3.5 Haiku (Anthropic's most cost-efficient model) at batch pricing:
Standard Claude 3.5 Haiku: $0.80/1M input tokens, $4.00/1M output tokens Batch Claude 3.5 Haiku: $0.40/1M input tokens, $2.00/1M output tokens
Legal document review, 1,000 documents at 50K tokens each:
- Input tokens: 50M tokens
- Output tokens: 5M tokens (10% of input for summaries)
- Batch cost: (50M × $0.40/1M) + (5M × $2.00/1M) = $20 + $10 = $30
- Standard API cost would be $60
- Savings: $30 on this single batch job
Financial report summarization, 500 reports at 20K tokens each:
- Input tokens: 10M tokens
- Output tokens: 1M tokens
- Batch cost: (10M × $0.40/1M) + (1M × $2.00/1M) = $4 + $2 = $6
- Savings: $6 vs standard API
For teams processing millions of documents monthly, the savings compound significantly.
Batch Size and Rate Limits
Anthropic's Message Batches API currently supports:
- Up to 10,000 requests per batch
- Results available within 24 hours
- Results expire 29 days after creation — download before then
For larger workloads, create multiple sequential batches. If your workload is time-sensitive and you need results sooner than 24 hours, use the standard synchronous API.
Choosing Between Claude Haiku and Sonnet for Batch Work
The choice between Claude Haiku (cheap, fast, good for most tasks) and Claude Sonnet (expensive, better reasoning) should be driven by a task-specific eval, not assumptions.
As a starting point: use Haiku for classification, extraction, and summarization. Test Sonnet only for tasks where your eval shows Haiku is failing at an unacceptable rate. The batch price difference between Haiku and Sonnet is roughly 5x, which is significant at scale.
Keep Reading
- OpenAI Batch API Guide — The same approach for GPT models, for comparison.
- Prompt Caching: Anthropic and OpenAI Guide — Combine with batch API for further savings on repeated system prompts.
- Cutting LLM API Costs: The Complete Guide — The full framework for reducing LLM spend.
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.