Your system prompt is sent with every single API request. A 2,000-token system prompt across 100,000 daily requests means 200 million tokens per day just for instructions. Cutting that prompt in half saves 100 million tokens daily. Here's how to do it without losing any instruction quality.

Before and After: A Complete Example

Let's start with a real-world system prompt for a customer support bot and optimize it step by step.

Before: 285 tokens

You are a helpful customer support assistant for Acme Corp.
You should always be polite and professional in your responses.
You should never make up information that you don't know.
If you don't know the answer to a question, you should say
"I don't know" and suggest that the customer contact our
support team at support@acme.com.

When responding to customers, please follow these guidelines:
1. Always greet the customer first
2. Address their question or concern directly
3. If the issue requires escalation, let them know
4. Always end with asking if there's anything else you can help with

You have access to the following information about our products:
- Widget Pro: $49.99, available in red, blue, green
- Widget Basic: $29.99, available in black, white
- All widgets come with a 30-day money-back guarantee
- Shipping is free on orders over $50

After: 112 tokens

Acme Corp support agent. Be polite, professional, factual.
Unknown answers: say so, direct to support@acme.com.

Response format: greet → answer → escalate if needed → ask
if anything else.

Products:
- Widget Pro: $49.99 (red/blue/green)
- Widget Basic: $29.99 (black/white)
- 30-day guarantee. Free shipping over $50.

Same instructions, 61% fewer tokens. The model follows both versions equally well.

Technique 1: Eliminate "You should" and "Please"

Models don't need politeness in instructions. They follow directives regardless of phrasing.

  • "You should always respond in JSON format""Respond in JSON."
  • "Please make sure to include all required fields""Include all required fields."
  • "You are a helpful assistant that...""Role: [description]."

Technique 2: Use Shorthand Notation

Replace verbose descriptions with compact notation that models understand perfectly:

  • "available in red, blue, and green colors""(red/blue/green)"
  • "The price is $49.99 per unit""$49.99/unit"
  • "respond using the JSON format""Output: JSON"
  • "between 100 and 500 words""100-500 words"

Technique 3: Use Structured Formats

Replace prose paragraphs with structured data. Models parse structured formats more reliably anyway.

# Before (verbose prose)
When the user asks about pricing, you should check which
plan they're interested in. Our Basic plan costs $10 per
month and includes 1000 API calls. Our Pro plan costs $50
per month and includes 10000 API calls.

# After (structured)
Pricing:
- Basic: $10/mo, 1K API calls
- Pro: $50/mo, 10K API calls

Technique 4: Merge Redundant Rules

Multiple rules that say similar things can be combined:

# Before
- Do not make up information
- Only use facts from the provided context
- If unsure, say you don't know
- Never hallucinate or fabricate data

# After
- Only use provided context. If unsure, say so.

Technique 5: Use Arrows for Workflows

Sequential steps compress well with arrow notation:

# Before (42 tokens)
First, analyze the user's question. Then, search the
knowledge base for relevant information. After that,
formulate a response. Finally, ask if they need more help.

# After (18 tokens)
Flow: analyze question → search KB → respond → ask if
more help needed.

Measuring the Impact

After optimizing, always verify two things:

  • Token count: Use a token counter to confirm the reduction
  • Output quality: Run your test suite with both versions and compare. If outputs are equivalent, ship the shorter version
System prompt optimization has the highest ROI of any token reduction technique because the savings multiply across every single request your application makes.