Temperature is a number you pass to an LLM API that controls how random the output is. Temperature 0 means the model always picks the highest-probability next token, producing deterministic output. Temperature 1.0 allows meaningful randomness, producing varied output. Temperature above 1.5 produces increasingly incoherent output. Understanding temperature, along with the related sampling parameters top-p and top-k, gives you direct control over the reliability and creativity of LLM responses.
How Token Selection Actually Works
Before explaining temperature, you need to understand what the model is doing at each step.
At every position in the output, the model produces a probability distribution over its entire vocabulary (roughly 100,000 tokens for GPT models). Entry for each token is a probability: the model's estimate of how likely that token is to come next given everything before it.
In a simple factual context, the distribution might be very peaked: "The capital of France is" produces a 97% probability for "Paris," a 1% probability for " the," and tiny probabilities for everything else. In a creative context, the distribution might be flatter: "She opened the door and saw" might assign 15% probability to "a," 12% to "her," 10% to "nothing," and so on.
Temperature applies a transformation to this distribution before sampling. A temperature of 1.0 uses the distribution as-is. A temperature below 1.0 sharpens the distribution (makes high-probability tokens even more likely). A temperature above 1.0 flattens it (makes low-probability tokens more competitive).
Concrete Examples of Temperature in Practice
Consider the prompt: "The best programming language for beginners is"
Temperature 0.0: "Python" — the model's highest-probability completion, repeated every time. Deterministic.
Temperature 0.5: "Python" most of the time, occasionally "JavaScript" or "Scratch." Some variation but still mostly the highest-probability options.
Temperature 1.0: "Python," "JavaScript," "Ruby," "Scratch," "Java" — meaningful variety across multiple runs. All plausible answers.
Temperature 1.5: "Python," "Lua," "an interesting question," "BASIC," "debated" — variety extends to less likely tokens. Some completions are odd.
Temperature 2.0: Increasingly incoherent. The model might complete with tokens that have no semantic connection to the prompt, because flattening the distribution enough makes rare tokens competitive with common ones.
The practical rule: never go above 1.0 for production applications. The gains in variety are outweighed by the degradation in coherence.
Top-p (Nucleus Sampling)
Top-p, also called nucleus sampling, is a different way to control randomness. Instead of scaling all token probabilities, top-p selects from only the smallest set of tokens whose combined probability exceeds p.
With top-p = 0.9: rank all tokens by probability. Sum their probabilities from highest to lowest. Stop when the running sum exceeds 0.9. Sample only from that set.
In a peaked distribution (high-confidence context), top-p = 0.9 might include only 5 to 10 tokens. In a flat distribution (low-confidence context), it might include hundreds.
The key difference from temperature: top-p adapts to the model's confidence. When the model is confident, it samples from a small set. When uncertain, it samples from a larger set. Temperature applies the same scaling regardless of confidence.
When to use top-p:
- top-p = 0.9 to 0.95 is a sensible default for most tasks
- Lower top-p (0.7 to 0.85) for tasks where you want more predictable output
- Most API providers recommend using either temperature or top-p, not both simultaneously
Top-k
Top-k limits the sampling pool to the k most probable tokens at each step, regardless of their cumulative probability.
Top-k = 1 is identical to temperature 0: always pick the highest-probability token. Top-k = 50 means the model only ever samples from the 50 most likely tokens, even if the 51st would be a better creative choice.
Top-k is less commonly used than temperature and top-p because it does not adapt to context. A flat top-k = 50 behaves very differently in a confident context versus an uncertain one.
When to use top-k: Some providers offer it as an alternative to top-p. If you are using top-k, values between 20 and 100 are typical for generation tasks.
Practical Settings by Use Case
Fact retrieval, classification, and structured output (temperature 0 or close): Set temperature to 0.0. You want deterministic output. The model's highest-probability answer to a factual question is its best answer. Randomness here produces variety without adding value.
Example: extracting dates from documents, classifying support tickets into categories, generating SQL from a natural language query.
Writing, analysis, and summaries (temperature 0.3 to 0.7): Some variety improves quality for writing tasks. Different runs produce different phrasings, and you can choose the best. Temperature 0.5 to 0.7 is a good default for tasks where quality output is more important than strict consistency.
Example: drafting emails, summarizing meeting notes, writing product descriptions.
Brainstorming and creative writing (temperature 0.7 to 1.0): Higher temperature produces more varied and unexpected outputs. For generating multiple options, brainstorming names, or creative writing, temperature 0.8 to 1.0 gives useful variety without degrading into incoherence.
Example: generating five alternative taglines for a product, brainstorming plot ideas, writing first drafts of marketing copy.
Code generation (temperature 0 to 0.2): Code has correct and incorrect answers. Higher temperature increases the chance of generating technically incorrect code that looks plausible. For code generation, keep temperature low.
The Common Mistake: High Temperature and High Top-p Together
Setting temperature to 1.5 and top-p to 0.95 simultaneously amplifies randomness in two directions. The temperature flattens the probability distribution, making rare tokens more competitive. High top-p then samples from a large set of those flattened probabilities. The combined effect is substantially more randomness than either parameter alone.
Most providers recommend treating temperature and top-p as alternatives, not complements. If you are using temperature to control randomness, set top-p to 1.0 (no restriction) and adjust temperature only. If you are using top-p, set temperature to 1.0 and adjust top-p only.
A Note on Determinism
Temperature 0 is described as deterministic, but in practice you may still see occasional output variation at temperature 0 due to floating-point computation differences on different hardware and in distributed inference systems. For most applications this does not matter. For applications where exact reproducibility is critical, you need to also control the random seed if the API supports it.
Keep Reading
- How Large Language Models Work: A Complete Guide Without the Math Overload — The token prediction process that temperature modifies
- Prompt Engineering Complete Guide 2026 — Temperature settings as part of a complete prompting strategy
- Why LLMs Hallucinate and How to Reduce It: A Practical Guide — Low temperature as a hallucination reduction technique
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.