OpenAI's Batch API lets you submit large volumes of requests and get them processed within 24 hours at 50% off the standard API price. If your workload doesn't need real-time responses — think data labeling, content generation, bulk analysis — this is the single biggest cost reduction available.
How It Works
Instead of making individual API calls, you upload a file containing all your requests as JSONL (one JSON object per line). OpenAI processes them asynchronously and returns the results when done. The tradeoff is latency: results come back within 24 hours instead of seconds.
The pricing discount applies to all tokens — both input and output:
- GPT-4o: $1.25/M input, $5.00/M output (vs $2.50/$10.00 standard)
- GPT-4o-mini: $0.075/M input, $0.30/M output (vs $0.15/$0.60 standard)
- o3: $5.00/M input, $20.00/M output (vs $10.00/$40.00 standard)
Step-by-Step Setup
1. Create the JSONL Input File
Each line is a JSON object with a custom ID, the HTTP method, the endpoint URL, and the request body:
import json
requests = [
{
"custom_id": "req-001",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "Classify the sentiment."},
{"role": "user", "content": "I love this product!"}
],
"max_tokens": 10
}
},
{
"custom_id": "req-002",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "Classify the sentiment."},
{"role": "user", "content": "Terrible experience."}
],
"max_tokens": 10
}
}
]
with open("batch_input.jsonl", "w") as f:
for req in requests:
f.write(json.dumps(req) + "\n")
2. Upload and Submit the Batch
from openai import OpenAI
client = OpenAI()
# Upload the input file
input_file = client.files.create(
file=open("batch_input.jsonl", "rb"),
purpose="batch"
)
# Create the batch
batch = client.batches.create(
input_file_id=input_file.id,
endpoint="/v1/chat/completions",
completion_window="24h"
)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.status}")
3. Poll for Completion
import time
while True:
batch = client.batches.retrieve(batch.id)
print(f"Status: {batch.status} "
f"({batch.request_counts.completed}/"
f"{batch.request_counts.total})")
if batch.status in ("completed", "failed", "expired"):
break
time.sleep(60) # Check every minute
4. Download Results
if batch.status == "completed":
result_file = client.files.content(
batch.output_file_id
)
results = result_file.text.strip().split("\n")
for line in results:
result = json.loads(line)
custom_id = result["custom_id"]
response = result["response"]["body"]
answer = response["choices"][0]["message"]["content"]
print(f"{custom_id}: {answer}")
When to Use Batch API
Batch processing is ideal for workloads where latency doesn't matter:
- Data labeling: Classify, tag, or annotate thousands of records
- Content generation: Generate product descriptions, summaries, or translations in bulk
- Evaluation: Run test suites against your prompts across many inputs
- Embeddings: Generate embeddings for large document corpora
- Nightly processing: Analyze daily logs, generate reports, process queued tasks
When NOT to Use It
- Real-time chat: Users can't wait 24 hours for a response
- Interactive applications: Anything requiring sub-second latency
- Small volumes: Under 1,000 requests, the setup overhead isn't worth it
- Time-sensitive data: If the data changes frequently, 24-hour-old results may be stale
Cost Savings at Scale
Let's calculate savings for a real workload: classifying 100,000 customer reviews with GPT-4o.
- Average input: 200 tokens per review (system prompt + review text)
- Average output: 20 tokens per classification
- Total input: 20M tokens, Total output: 2M tokens
Standard API: (20M × $2.50/M) + (2M × $10/M) = $50 + $20 = $70
Batch API: (20M × $1.25/M) + (2M × $5/M) = $25 + $10 = $35
That's $35 saved on a single batch run. Run this daily and you save over $1,000 per month.
If your workload can tolerate 24-hour latency, the Batch API is free money. Same model, same quality, half the price.