A token in a large language model is a chunk of text that the model treats as a single unit of processing. It is not a word, not a character, and not a syllable, though it can be any of those things depending on the model and the word. In practice, one token is roughly three to four characters of English text. A 1,000-word document is approximately 1,300 to 1,500 tokens. Understanding this one fact will reduce your API costs and make you better at writing prompts.
Why Tokens Exist
Language models do not read text the way you do. They process discrete numerical representations of text fragments. The process of breaking text into these fragments is called tokenization, and it happens before the model sees your input.
The tokenizer used by GPT models is called tiktoken, and it uses a method called byte-pair encoding (BPE). BPE starts with individual characters and progressively merges the most frequent character pairs into new tokens. After enough merges on a large corpus, the most common English words become single tokens. Less common words or words from other languages get split into multiple tokens.
Some concrete examples of GPT-4 tokenization:
- "ChatGPT" = 1 token
- "artificial" = 2 tokens ("art" + "ificial")
- "intelligence" = 3 tokens
- "the" = 1 token
- "supercalifragilistic" = 6 tokens
- A typical space + word combination = 1 token for common words
You can verify any of these using OpenAI's Tokenizer tool (platform.openai.com/tokenizer), which shows exactly how a string is divided.
Why Tokens Matter for Cost
Every major LLM API charges per token, not per request, character, or word. Understanding the token count of your prompts is the fastest way to reduce your API costs.
GPT-4o pricing as of May 2026 (OpenAI pricing page): $5.00 per million input tokens, $15.00 per million output tokens. Claude 3.5 Sonnet (Anthropic pricing page): $3.00 per million input tokens, $15.00 per million output tokens.
A system prompt that is 500 tokens long, sent with every API request, adds 500 tokens to every single call. If you make 100,000 calls per month, that is 50 million tokens, or $250 per month just from your system prompt at GPT-4o input pricing. Trimming 200 tokens from that system prompt saves $100 per month.
At scale, this compounds. A product with 10,000 active users making 20 API calls per day is making 200,000 calls per day. Every 100 tokens of unnecessary content in your prompt costs approximately $300 per day at GPT-4o input pricing.
The practical rule: be concise in your prompts. Every unnecessary sentence costs money at scale.
Why Tokens Matter for Context Limits
Every model has a maximum context window measured in tokens. GPT-4o supports 128,000 tokens. Claude 3.5 Sonnet supports 200,000. Gemini 1.5 Pro supports 1,000,000.
The context window includes your system prompt, your conversation history, the documents you attach, and the model's responses. All of it counts. When the window fills up, the model either stops processing or begins dropping the earliest content.
Knowing the token count of what you are sending helps you plan. A 50-page PDF is approximately 25,000 to 35,000 tokens. You can fit about four of them in a single GPT-4o request. A full codebase might be 200,000 tokens, which exceeds GPT-4o's limit but fits in Claude's.
The OpenAI Python library includes a tiktoken package you can use to count tokens before sending:
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
token_count = len(enc.encode(your_text))
print(f"Token count: {token_count}")
This lets you check token counts in your application code before hitting API limits.
How Tokenization Differs Between Models
GPT models use tiktoken's BPE tokenizer. Claude uses a similar approach but with different merge rules and vocabulary, which means the same text may produce slightly different token counts between GPT-4o and Claude. The difference is usually within 5 to 10 percent.
Gemini uses a SentencePiece tokenizer, which takes a different approach, particularly for non-Latin scripts and technical notation. For most English text, the differences are small. For code, mathematical expressions, or non-English text, tokenization differences between models can be more significant.
One practical consequence: if you are switching from one model provider to another, do not assume your context window calculations carry over exactly. Recount your tokens using the target model's tokenizer.
Why Tokenization Affects Prompt Quality
Because the model processes tokens, not words or meanings, the way you write prompts can affect quality in subtle ways.
Rare compound words or highly technical terms get split into multiple tokens that may not carry the full meaning of the compound. If you use a highly specialized term that appeared rarely in training data, the model may have learned the component tokens but not the combined concept.
Similarly, inconsistent formatting in your prompts can create unexpected token boundaries. Extra whitespace, unusual punctuation, or unconventional capitalization can split tokens in ways that degrade the model's understanding of your intent.
The practical guidance: write prompts in clear, standard English. Use technical terms consistently. Avoid unusual formatting unless it is specifically helpful for structure. These habits make prompts cheaper and more reliable.
Keep Reading
- How Large Language Models Work: A Complete Guide Without the Math Overload — The broader guide to LLM mechanics that puts tokens in context
- Context Window in LLMs Explained: Why It Matters More Than You Think — How context windows, measured in tokens, limit what you can do with LLMs
- Prompt Engineering Complete Guide 2026 — How to write prompts that get better results with fewer tokens
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.