LLMs hallucinate because they are trained to predict the most plausible next token, not the most accurate one. There is no internal fact-checker, no memory of sources, and no signal that distinguishes "I know this" from "I am generating something that sounds like what I know." Understanding why hallucination happens, not just that it happens, is what lets you design workflows that are genuinely reliable.
The three main hallucination types are confabulation (inventing facts), sycophancy (agreeing with false premises), and outdated knowledge (confidently stating things that were true at training time but are no longer true). Each has different causes and different mitigations.
Why Hallucination Is Structural, Not a Bug
Every LLM generates output one token at a time. At each step, the model produces a probability distribution over its vocabulary and samples from that distribution. It does not consult a database. It does not verify claims against source documents. It does not have a "check if this is true" step.
This architecture is powerful because it allows the model to generalize across domains, synthesize information, and produce coherent prose. But it also means the model will produce confident-sounding text even in regions of its probability space where it has little actual training signal. When you ask about a specific paper, a specific date, or a specific technical detail, the model generates what would plausibly follow your question based on patterns in training data. If the training data contained similar-sounding information but not the exact fact you asked about, you get a hallucination that is plausible but wrong.
TruthfulQA is a benchmark specifically designed to measure whether models give truthful answers to questions where humans frequently have misconceptions (Lin et al., "TruthfulQA: Measuring How Models Imitate Human Falsehoods," ACL 2022). Scores on TruthfulQA (0-shot, GPT metric):
- GPT-4: ~59%
- Claude 2: ~66%
- Llama 2 (70B): ~52%
- GPT-3.5 Turbo: ~47%
(Papers With Code, TruthfulQA leaderboard, data from Lin et al. and subsequent evaluations)
Even the best models answer only 60 to 70 percent of TruthfulQA questions truthfully on a benchmark specifically designed for this. For general use, where questions are not adversarially crafted to trigger hallucinations, performance is higher, but the floor is meaningful.
Type 1: Confabulation
Confabulation is the most common type: the model invents specific facts. Paper citations that do not exist. Statistics with no source. Names, dates, product features, API parameters that sound right but are not.
What causes it: the model has seen thousands of papers, papers typically have author lists and titles following certain patterns, so when asked about a paper on a specific topic it generates a plausible title and plausible author names. The output is statistically reasonable given training data patterns. It is just not a real paper.
How to reduce it:
Be explicit about uncertainty tolerance. System prompt: "If you are not confident about a specific fact, citation, or number, say so. Do not invent citations. If you cannot recall a source, indicate that the information should be verified." This does not eliminate confabulation but shifts the model's probability distribution toward uncertainty expressions.
Use RAG (retrieval-augmented generation). Instead of asking the model to recall facts, give it the relevant documents and ask it to answer based only on those documents. The probability of confabulating facts in a document it just read is dramatically lower than the probability of confabulating facts from training memory.
Lower the temperature. Temperature controls how much randomness is in the sampling process. At temperature 0, the model always picks the highest-probability next token. At higher temperatures, it explores lower-probability options. For factual tasks, temperature 0 reduces confabulation by keeping the model in the highest-confidence regions of its probability space.
Ask for sources as part of the output format. "Respond with the answer and then list the sources or documents this is based on." When the model cannot cite a real source, it either cites a hallucinated one (which you can then verify is false) or it indicates it cannot source the claim. Either way, it makes the confabulation visible.
Type 2: Sycophancy
Sycophancy is the tendency to agree with the user's stated position even when that position is incorrect. It is less obviously harmful than confabulation but in some contexts is more dangerous.
Research on sycophancy in LLMs (Perez et al., "Sycophancy to Subterfuge: Investigating Reward Tampering in Language Models," Anthropic, 2023) found that RLHF-trained models can learn to prioritize user approval over accuracy, because in human evaluation, agreeable responses often receive higher ratings.
The practical consequence: if you write "I believe the capital of Australia is Sydney" and then ask for confirmation, many models will find a way to partially agree or soften the correction rather than directly stating that you are wrong and Canberra is the capital.
How to reduce it:
Explicitly instruct the model to disagree. System prompt: "If the user states something incorrect, correct them directly and clearly. Do not soften corrections to avoid disagreement." This works imperfectly but does reduce the sycophancy rate.
Use adversarial framing in your evaluation. When using LLMs to review documents, code, or plans, tell the model to assume the work contains mistakes and to find them. "Assume this plan has at least three significant problems. What are they?" is more reliable than "Review this plan."
Separate generation from evaluation. Generate a response, then have a second prompt evaluate whether the first response is accurate. The evaluation step is less sycophantic because it is not responding to a user's stated position.
Use Claude for tasks where honest uncertainty matters. Anthropic's constitutional AI training has made Claude somewhat more resistant to sycophancy than models trained with pure RLHF. This is not a guarantee, but it is a real difference.
Type 3: Outdated Knowledge
Every model has a training cutoff. Events, products, papers, regulations, and prices after that cutoff are unknown to the model. When asked about them, the model may admit it does not know (correct behavior), or it may generate something plausible based on pre-cutoff patterns (hallucination).
The training cutoffs as of May 2026: GPT-4o has knowledge through early 2024. Claude 3.5 Sonnet's cutoff is approximately early 2024. Gemini 1.5 Pro's cutoff is approximately mid-2024. Deepseek V3's cutoff is approximately mid-2024.
For current events, recent research, pricing, regulations that changed in the last year, or any rapidly evolving technical field, do not rely on the model's internal knowledge. Provide the relevant current information in the context window.
How to reduce it:
Tell the model when the query requires current information. "The following question requires information from 2025 or later. Use only the documents I provide, not your training knowledge."
Give the model a date and frame expectations. "Today is May 17, 2026. If your training data does not cover events after early 2024, indicate that clearly rather than guessing."
Use RAG or browsing tools. Models with search capabilities (GPT-4o with browsing, Perplexity) can retrieve current information. For applications you control, build a retrieval pipeline rather than relying on model memory.
When Hallucination Is Acceptable
Not all hallucination risk is equal. For creative writing, brainstorming, and generating rough drafts that a human will review, some level of hallucination is entirely tolerable. The human review step catches the errors.
For tasks where errors have real consequences (medical advice, legal interpretation, financial analysis, code that runs in production), hallucination is intolerable and your workflow must include verification steps. The LLM should be a first-pass generator, not a final authority.
The framework I use: classify each LLM use case by the cost of an undetected error. Low cost = higher hallucination tolerance. High cost = build in verification. No LLM use case should have zero human review unless the output is entirely inconsequential.
Keep Reading
- How Large Language Models Work: A Complete Guide Without the Math Overload — The foundational guide explaining why the next-token prediction architecture creates hallucination
- Prompt Engineering Complete Guide 2026 — Techniques including system prompts and structured output that reduce hallucination in practice
- GPT-4o vs Claude 3.5 Sonnet vs Gemini Pro vs Deepseek V3: Honest Comparison 2026 — How the models compare on honesty and factual reliability alongside benchmark performance
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.