What Is a Token in an LLM? A Plain-English Explanation
A token is not a word. It is a text chunk of 1-4 characters. Understanding tokens directly reduces your API costs and improves how you structure prompts.
A token in a large language model is a chunk of text that the model treats as a single unit of processing. It is not a word, not a character, and not a syllable, though it can be any of those things depending on the model and the word. In practice, one token is roughly three to four characters of English text. A 1,000-word document is approximately 1,300 to 1,500 tokens. Understanding this one fact will reduce your API costs and make you better at writing prompts.
Why Tokens Exist
Language models do not read text the way you do. They process discrete numerical representations of text fragments. The process of breaking text into these fragments is called tokenization, and it happens before the model sees your input.
The tokenizer used by GPT models is called tiktoken, and it uses a method called byte-pair encoding (BPE). BPE starts with individual characters and progressively merges the most frequent character pairs into new tokens. After enough merges on a large corpus, the most common English words become single tokens. Less common words or words from other languages get split into multiple tokens.
Some concrete examples of GPT-4 tokenization:
"ChatGPT" = 1 token
"artificial" = 2 tokens ("art" + "ificial")
"intelligence" = 3 tokens
"the" = 1 token
"supercalifragilistic" = 6 tokens
A typical space + word combination = 1 token for common words
You can verify any of these using OpenAI's Tokenizer tool (platform.openai.com/tokenizer), which shows exactly how a string is divided.
Why Tokens Matter for Cost
Every major LLM API charges per token, not per request, character, or word. Understanding the token count of your prompts is the fastest way to reduce your API costs.
GPT-4o pricing as of May 2026 (OpenAI pricing page): $5.00 per million input tokens, $15.00 per million output tokens. Claude 3.5 Sonnet (Anthropic pricing page): $3.00 per million input tokens, $15.00 per million output tokens.
A system prompt that is 500 tokens long, sent with every API request, adds 500 tokens to every single call. If you make 100,000 calls per month, that is 50 million tokens, or $250 per month just from your system prompt at GPT-4o input pricing. Trimming 200 tokens from that system prompt saves $100 per month.
At scale, this compounds. A product with 10,000 active users making 20 API calls per day is making 200,000 calls per day. Every 100 tokens of unnecessary content in your prompt costs approximately $300 per day at GPT-4o input pricing.
The practical rule: be concise in your prompts. Every unnecessary sentence costs money at scale.
Team workspace
Ship faster with chat, meetings, and projects in one place — Zlyqor.
Every model has a maximum context window measured in tokens. GPT-4o supports 128,000 tokens. Claude 3.5 Sonnet supports 200,000. Gemini 1.5 Pro supports 1,000,000.
The context window includes your system prompt, your conversation history, the documents you attach, and the model's responses. All of it counts. When the window fills up, the model either stops processing or begins dropping the earliest content.
Knowing the token count of what you are sending helps you plan. A 50-page PDF is approximately 25,000 to 35,000 tokens. You can fit about four of them in a single GPT-4o request. A full codebase might be 200,000 tokens, which exceeds GPT-4o's limit but fits in Claude's.
The OpenAI Python library includes a tiktoken package you can use to count tokens before sending:
This lets you check token counts in your application code before hitting API limits.
How Tokenization Differs Between Models
GPT models use tiktoken's BPE tokenizer. Claude uses a similar approach but with different merge rules and vocabulary, which means the same text may produce slightly different token counts between GPT-4o and Claude. The difference is usually within 5 to 10 percent.
Gemini uses a SentencePiece tokenizer, which takes a different approach, particularly for non-Latin scripts and technical notation. For most English text, the differences are small. For code, mathematical expressions, or non-English text, tokenization differences between models can be more significant.
One practical consequence: if you are switching from one model provider to another, do not assume your context window calculations carry over exactly. Recount your tokens using the target model's tokenizer.
Why Tokenization Affects Prompt Quality
Because the model processes tokens, not words or meanings, the way you write prompts can affect quality in subtle ways.
Rare compound words or highly technical terms get split into multiple tokens that may not carry the full meaning of the compound. If you use a highly specialized term that appeared rarely in training data, the model may have learned the component tokens but not the combined concept.
Similarly, inconsistent formatting in your prompts can create unexpected token boundaries. Extra whitespace, unusual punctuation, or unconventional capitalization can split tokens in ways that degrade the model's understanding of your intent.
The practical guidance: write prompts in clear, standard English. Use technical terms consistently. Avoid unusual formatting unless it is specifically helpful for structure. These habits make prompts cheaper and more reliable.
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace - chat, projects, time tracking, AI meeting summaries, and invoicing - in one tool. Try it free.
Practical deep-dives on LLMs, developer tools, and AI engineering. No filler. Unsubscribe any time.
// written byFIG. AUTH-01
530
Mahmudul Haque Qudrati
CEO & ML Engineer
CEO and ML Engineer at Pristren. Builds AI-powered software for teams and writes about machine learning, LLMs, developer tools, and practical AI applications.
What Is OpenAI Frontier Models and Codex on AWS? A Practical Overview
OpenAI's frontier models and Codex are now available on AWS through Amazon Bedrock and SageMaker. This post covers what's included, how it works, and the practical tradeoffs for teams considering this integration.