OpenAI's Batch API lets you submit large volumes of requests and get them processed within 24 hours at 50% off the standard API price. If your workload doesn't need real-time responses — think data labeling, content generation, bulk analysis — this is the single biggest cost reduction available.

How It Works

Instead of making individual API calls, you upload a file containing all your requests as JSONL (one JSON object per line). OpenAI processes them asynchronously and returns the results when done. The tradeoff is latency: results come back within 24 hours instead of seconds.

The pricing discount applies to all tokens — both input and output:

  • GPT-4o: $1.25/M input, $5.00/M output (vs $2.50/$10.00 standard)
  • GPT-4o-mini: $0.075/M input, $0.30/M output (vs $0.15/$0.60 standard)
  • o3: $5.00/M input, $20.00/M output (vs $10.00/$40.00 standard)

Step-by-Step Setup

1. Create the JSONL Input File

Each line is a JSON object with a custom ID, the HTTP method, the endpoint URL, and the request body:

import json

requests = [
    {
        "custom_id": "req-001",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-4o",
            "messages": [
                {"role": "system", "content": "Classify the sentiment."},
                {"role": "user", "content": "I love this product!"}
            ],
            "max_tokens": 10
        }
    },
    {
        "custom_id": "req-002",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-4o",
            "messages": [
                {"role": "system", "content": "Classify the sentiment."},
                {"role": "user", "content": "Terrible experience."}
            ],
            "max_tokens": 10
        }
    }
]

with open("batch_input.jsonl", "w") as f:
    for req in requests:
        f.write(json.dumps(req) + "\n")

2. Upload and Submit the Batch

from openai import OpenAI

client = OpenAI()

# Upload the input file
input_file = client.files.create(
    file=open("batch_input.jsonl", "rb"),
    purpose="batch"
)

# Create the batch
batch = client.batches.create(
    input_file_id=input_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

print(f"Batch ID: {batch.id}")
print(f"Status: {batch.status}")

3. Poll for Completion

import time

while True:
    batch = client.batches.retrieve(batch.id)
    print(f"Status: {batch.status} "
          f"({batch.request_counts.completed}/"
          f"{batch.request_counts.total})")

    if batch.status in ("completed", "failed", "expired"):
        break
    time.sleep(60)  # Check every minute

4. Download Results

if batch.status == "completed":
    result_file = client.files.content(
        batch.output_file_id
    )
    results = result_file.text.strip().split("\n")

    for line in results:
        result = json.loads(line)
        custom_id = result["custom_id"]
        response = result["response"]["body"]
        answer = response["choices"][0]["message"]["content"]
        print(f"{custom_id}: {answer}")

When to Use Batch API

Batch processing is ideal for workloads where latency doesn't matter:

  • Data labeling: Classify, tag, or annotate thousands of records
  • Content generation: Generate product descriptions, summaries, or translations in bulk
  • Evaluation: Run test suites against your prompts across many inputs
  • Embeddings: Generate embeddings for large document corpora
  • Nightly processing: Analyze daily logs, generate reports, process queued tasks

When NOT to Use It

  • Real-time chat: Users can't wait 24 hours for a response
  • Interactive applications: Anything requiring sub-second latency
  • Small volumes: Under 1,000 requests, the setup overhead isn't worth it
  • Time-sensitive data: If the data changes frequently, 24-hour-old results may be stale

Cost Savings at Scale

Let's calculate savings for a real workload: classifying 100,000 customer reviews with GPT-4o.

  • Average input: 200 tokens per review (system prompt + review text)
  • Average output: 20 tokens per classification
  • Total input: 20M tokens, Total output: 2M tokens

Standard API: (20M × $2.50/M) + (2M × $10/M) = $50 + $20 = $70

Batch API: (20M × $1.25/M) + (2M × $5/M) = $25 + $10 = $35

That's $35 saved on a single batch run. Run this daily and you save over $1,000 per month.

If your workload can tolerate 24-hour latency, the Batch API is free money. Same model, same quality, half the price.