What is OpenAI Batch API?

OpenAI Batch API is a service that allows you to submit a large number of API requests as a batch job, which are processed within 24 hours. In exchange for the delayed response, you receive a 50% discount on all model pricing. It supports chat completions and embeddings endpoints.

How does OpenAI Batch API work?

You create a JSONL file where each line is a separate API request with a custom ID, method, URL, and body. Upload this file to OpenAI's Files API, then create a batch job referencing the file. OpenAI processes the requests within 24 hours. You poll for completion and download the results as a JSONL file.

What are the best practices for OpenAI Batch API?

Batch similar-sized requests to avoid wasted capacity. Use the smallest model that meets your accuracy needs. Combine multiple tasks into one batch job to stay within the 50,000 request limit. Monitor batch completion times and submit jobs early if you need results by morning. Retry failed requests by re-submitting only the failed custom IDs.

How much does OpenAI Batch API cost?

Batch API costs 50% of the standard per-token rate. For example, GPT-4o batch input is $1.25/1M tokens vs $2.50 standard, and batch output is $5.00/1M tokens vs $10.00. GPT-4o-mini batch input is $0.075/1M tokens vs $0.15 standard. Embeddings are also half price.

Is OpenAI Batch API worth it in 2026?

Yes. For any non-real-time workload, the Batch API is the easiest way to cut OpenAI costs by 50%. The implementation is simple, savings are immediate, and risk is minimal. If you have batchable tasks like data labeling, nightly analysis, or content moderation, you should be using it.

What are the limits of OpenAI Batch API?

Batch jobs expire after 24 hours if not completed. Maximum batch size is 50,000 requests or 200MB of input data per job. Only /v1/chat/completions and /v1/embeddings endpoints are supported. There is no webhook; you must poll for completion.

OpenAI Batch API: Get 50% Off for Non-Real-Time Requests (2026)

How the Batch API Works

The Batch API is not a different endpoint. It is the same models (GPT-4o, GPT-4o-mini, text-embedding-3-small, and others) with a different pricing model in exchange for relaxed latency requirements.

The workflow:

Create a JSONL file where each line is one API request

Upload the file to OpenAI's Files API

Create a batch job referencing the uploaded file

Poll the batch job status (or set a callback)

When complete, download the results JSONL file

from openai import OpenAI
import json

client = OpenAI()

# Step 1: Create request JSONL
requests = [
    {
        "custom_id": "request-1",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-4o-mini",
            "messages": [
                {"role": "system", "content": "Classify the sentiment of the following text as positive, negative, or neutral."},
                {"role": "user", "content": "The product arrived on time and works perfectly."}
            ],
            "max_tokens": 10
        }
    },
    {
        "custom_id": "request-2",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-4o-mini",
            "messages": [
                {"role": "system", "content": "Classify the sentiment of the following text as positive, negative, or neutral."},
                {"role": "user", "content": "Terrible experience, would not recommend."}
            ],
            "max_tokens": 10
        }
    }
]

# Write to JSONL file
with open("batch_requests.jsonl", "w") as f:
    for req in requests:
        f.write(json.dumps(req) + "\n")

# Step 2: Upload the file
with open("batch_requests.jsonl", "rb") as f:
    batch_file = client.files.create(file=f, purpose="batch")

# Step 3: Create batch job
batch_job = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

print(f"Batch job created: {batch_job.id}")

Checking Status and Retrieving Results

import time

def wait_for_batch(batch_id: str, poll_interval: int = 60):
    while True:
        batch = client.batches.retrieve(batch_id)
        print(f"Status: {batch.status}, completed: {batch.request_counts.completed}/{batch.request_counts.total}")

        if batch.status == "completed":
            return batch
        elif batch.status in ["failed", "expired", "cancelled"]:
            raise Exception(f"Batch failed with status: {batch.status}")

        time.sleep(poll_interval)

# Wait for completion (in production, use a scheduled job or webhook)
completed_batch = wait_for_batch(batch_job.id)

# Download results
result_file = client.files.content(completed_batch.output_file_id)
results = [json.loads(line) for line in result_file.text.strip().split("\n")]

for result in results:
    custom_id = result["custom_id"]
    response_content = result["response"]["body"]["choices"][0]["message"]["content"]
    print(f"{custom_id}: {response_content}")

Pricing: Actual Savings

The batch pricing is 50% of the standard price. As of May 2026:

Model	Standard Input	Batch Input	Standard Output	Batch Output
GPT-4o	$2.50/1M	$1.25/1M	$10.00/1M	$5.00/1M
GPT-4o-mini	$0.15/1M	$0.075/1M	$0.60/1M	$0.30/1M
text-embedding-3-small	$0.02/1M	$0.01/1M	—	—

For a data labeling workload processing 100 million tokens per month on GPT-4o-mini, the savings are $7.50/month ($15 standard vs. $7.50 batch). For the same workload on GPT-4o, savings are $125/month. For high-volume embedding workloads, batch API cuts embedding costs in half.

Use Cases That Are Perfect for Batch API

Sentiment analysis at scale. If you process customer feedback, support tickets, or social media mentions, these can all be batched and processed overnight.

Document processing pipelines. Summarizing documents, extracting entities, classifying content — anything that processes a backlog of documents on a schedule rather than in real time.

Data augmentation for ML. Generating synthetic training data, labeling examples, generating variations — all high-volume, non-real-time workloads.

Nightly report generation. Generating summaries of daily activity, flagging anomalies, creating management reports — runs overnight and is ready in the morning.

Bulk content moderation. For platforms that moderate user-generated content, batch processing of older content that does not need immediate review.

Limits and Considerations

Batch jobs expire after 24 hours if not completed. If OpenAI is under high load, your job might not complete within the window — though in practice this is rare.

The maximum batch size is 50,000 requests or 200MB of input data per job, whichever is smaller. For larger workloads, split into multiple batch jobs.

Batch requests must use the /v1/chat/completions or /v1/embeddings endpoint. Other endpoints (like images or audio) are not supported.

There is no webhook support for batch completion — you must poll. In production, implement a scheduled job (cron, Celery beat, or similar) to check batch status every 15-30 minutes.

Best Practices for Maximizing Savings

To get the most out of the Batch API, follow these guidelines:

Batch similar-sized requests to avoid wasted capacity. If you mix very short and very long requests, the batch may take longer to complete.
Use the smallest model that meets your accuracy needs. GPT-4o-mini is often sufficient for classification and extraction tasks, and its batch rate is 50% cheaper than GPT-4o.
Combine multiple tasks into one batch job to stay within the 50,000 request limit efficiently.
Monitor batch completion times during your peak hours. If you need results by morning, submit jobs before the end of the business day.
Retry failed requests by checking the errors field in the batch response. You can re-submit only the failed custom IDs.

Is the Batch API Worth It in 2026?

Absolutely. For any non-real-time workload, the Batch API is the single easiest way to cut OpenAI costs in half. The implementation is straightforward, the savings are immediate, and the risk is minimal. If you are not using it for your batchable tasks, you are leaving money on the table.

Keep Reading

Anthropic Batch API Guide — The same approach for Claude models.
Cutting LLM API Costs: The Complete Guide — Full framework combining all cost reduction strategies.
LLM API Pricing Comparison 2026 — Current pricing across providers to identify your highest-cost workloads.

Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.

OpenAI Batch API: Get 50% Off for Non-Real-Time Requests

How the Batch API Works

Checking Status and Retrieving Results

AI & ML insights, weekly

Mahmudul Haque Qudrati

Related Articles

Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering

Why Does MCP Use So Many Tokens? (And How to Fix It)

Claude 3.5 Sonnet Review: What It Does Better Than GPT-4o (and Where It Falls Short)

Pricing: Actual Savings

Use Cases That Are Perfect for Batch API

Limits and Considerations

Best Practices for Maximizing Savings

Is the Batch API Worth It in 2026?

Keep Reading

Frequently Asked Questions

What is OpenAI Batch API?

How does OpenAI Batch API work?

What are the best practices for OpenAI Batch API?

How much does OpenAI Batch API cost?

Is OpenAI Batch API worth it in 2026?

What are the limits of OpenAI Batch API?

The workspace your team
actually needs

OpenAI Batch API: Get 50% Off for Non-Real-Time Requests

How the Batch API Works

Checking Status and Retrieving Results

AI & ML insights, weekly

Mahmudul Haque Qudrati

Related Articles

Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering

Why Does MCP Use So Many Tokens? (And How to Fix It)

Claude 3.5 Sonnet Review: What It Does Better Than GPT-4o (and Where It Falls Short)

Pricing: Actual Savings

Use Cases That Are Perfect for Batch API

Limits and Considerations

Best Practices for Maximizing Savings

Is the Batch API Worth It in 2026?

Keep Reading

Frequently Asked Questions

What is OpenAI Batch API?

How does OpenAI Batch API work?

What are the best practices for OpenAI Batch API?

How much does OpenAI Batch API cost?

Is OpenAI Batch API worth it in 2026?

What are the limits of OpenAI Batch API?

The workspace your teamactually needs

The workspace your team
actually needs