Every time you send a message to ChatGPT, Claude, or any other large language model, your text gets broken into small pieces called tokens before the model processes it. Understanding tokens is essential because they directly determine how much you pay and how much text you can fit into a single request.

Tokens Are Not Words

A common misconception is that one token equals one word. In reality, tokens are subword units — fragments of text that a tokenizer has learned to recognize as useful building blocks. A short, common English word like "hello" is typically a single token. But a longer or rarer word gets split into multiple pieces.

Consider the difference:

  • "hello" → 1 token
  • "indescribable" → 3 tokens (ind, escrib, able)
  • "antidisestablishmentarianism" → 6 tokens

Numbers, punctuation, and whitespace also consume tokens. The string "2024-01-15" might use 5 tokens, while a single space between words is usually merged into the following word's token.

How Tokenization Actually Works

Modern LLMs use algorithms like Byte Pair Encoding (BPE) to build their vocabulary. During training, the algorithm starts with individual characters and repeatedly merges the most frequent pairs until it reaches a target vocabulary size — typically 50,000 to 200,000 entries.

The result is a fixed dictionary of token IDs. When you send text to an API, the tokenizer converts your string into a sequence of these IDs. The model only sees numbers, never raw text.

A rough rule of thumb for English: 1 token ≈ 4 characters, or about ¾ of a word. But this varies significantly by language and content type.

Why Tokens Matter for Cost

API providers charge per token, not per word or character. OpenAI's GPT-4o charges $2.50 per million input tokens and $10 per million output tokens. If your prompt uses 2,000 tokens when it could use 800, you're paying 2.5x more than necessary for every single request.

At scale, this adds up fast. A chatbot handling 100,000 conversations per day with an extra 1,000 tokens per conversation wastes 100 million tokens daily — hundreds of dollars in unnecessary cost.

Input vs. Output Tokens

Most providers charge differently for input (your prompt) and output (the model's response). Output tokens are typically 2–4x more expensive. This means controlling the length of the model's response can save even more than trimming your prompt.

Why Tokens Matter for Context Windows

Every model has a context window — the maximum number of tokens it can process in a single request. This includes both your input and the model's output. GPT-4o supports 128K tokens, Claude 3.5 supports 200K, and Gemini 1.5 Pro supports up to 2 million.

If your input exceeds the context window, the API will reject the request or silently truncate it. When building applications that process long documents, you need to count tokens beforehand to avoid hitting these limits.

Practical Implications

Understanding tokens helps you:

  • Estimate costs before committing to an API provider
  • Optimize prompts to fit more useful content within context limits
  • Choose the right model based on your document sizes
  • Debug unexpected behavior when responses get cut off
  • Build better RAG systems by chunking documents at token boundaries

The first step is always measuring. Use a token counter to see exactly how your text breaks down before sending it to any API.