Chain of thought (CoT) prompting is a technique where you prompt a language model to reason step by step before giving its final answer. It works because intermediate reasoning steps become part of the model's context, allowing it to build on correct intermediate conclusions rather than jumping to an answer in one shot. Wei et al.'s 2022 paper demonstrated accuracy improvements of 10 to 40 percentage points on reasoning tasks across multiple model sizes and task types.
The key insight is this: chain of thought does not make the model smarter. It makes the model's reasoning visible and sequential, which reduces the probability of errors in longer chains because each correct intermediate step constrains the next.
Why CoT Works
When you ask a model a complex question, it generates an answer in a single forward pass. The "answer" is whatever token sequence is most likely given the question. For simple questions, the most likely answer is the correct one. For questions requiring multiple reasoning steps, the probability of generating the correct final answer in one shot decreases as the number of steps increases.
When you add CoT, the model generates the intermediate steps first. Each step becomes part of the context for the next step. This is similar to why humans solve math problems by writing out the work rather than computing the answer mentally: the written intermediate values are more reliable than working memory.
The original CoT paper (Wei et al., "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models," NeurIPS 2022) tested the effect across arithmetic, commonsense, and symbolic reasoning tasks. On GSM8K (grade school math), adding CoT to GPT-3 improved accuracy from 18% to 57%. The improvement was largest on the most capable models; smaller models showed minimal gains.
Pattern 1: Zero-Shot CoT ("Let's Think Step by Step")
The simplest form. Add a phrase that triggers step-by-step reasoning.
Without CoT: Prompt: "If a train leaves Chicago at 3pm traveling at 80mph and another leaves New York at 4pm traveling at 100mph, and the cities are 790 miles apart, when do they meet?"
Output: Often incorrect or no working shown.
With zero-shot CoT: Prompt: "If a train leaves Chicago at 3pm traveling at 80mph and another leaves New York at 4pm traveling at 100mph, and the cities are 790 miles apart, when do they meet? Let's think step by step."
Output: "By 4pm, the Chicago train has already traveled 80 miles. The remaining distance is 790 - 80 = 710 miles. After 4pm, both trains are moving toward each other at a combined speed of 80 + 100 = 180mph. Time to meet: 710 / 180 = 3.94 hours. 4pm + 3.94 hours = approximately 7:57pm. They meet around 7:57pm."
Trigger phrases that work: "Let's think step by step," "Work through this carefully," "Think through this before answering," "Show your reasoning." The specific phrase matters less than having any phrase that signals that intermediate steps should be shown.
Pattern 2: Few-Shot CoT
You provide examples where the input includes step-by-step reasoning. The model learns the reasoning pattern from the examples.
Few-shot CoT example for word problems:
Q: A baker made 3 dozen cookies. She sold 2/3 of them and gave away 8. How many are left?
A: Start with 3 dozen = 36 cookies. Sold 2/3: 36 × (2/3) = 24 sold. Remaining after selling: 36 - 24 = 12. Given away: 12 - 8 = 4. She has 4 cookies left.
Q: A pool holds 2,000 gallons. It loses 50 gallons per day to evaporation and gains 30 gallons per day from rain. How many days until the pool is half full?
A:
The model applies the "show all arithmetic, state intermediate values, narrate what each calculation represents" pattern learned from the example.
Few-shot CoT is more reliable than zero-shot CoT for complex domains where "step by step" is ambiguous. The examples define what "step" means for your specific problem type.
Pattern 3: Self-Consistency CoT
Generate multiple chains of thought and select the most frequent answer. This works because different chains may make different errors, but correct reasoning paths are more likely to converge on the right answer.
How to implement:
- Send the same CoT prompt 5 to 10 times with temperature > 0 (0.5 to 0.7)
- Extract the final answer from each response
- Take the majority answer
Wang et al. ("Self-Consistency Improves Chain of Thought Reasoning," ICLR 2023) showed this approach improved accuracy by 10 to 15 percentage points over single-sample CoT on math and reasoning benchmarks, at the cost of 5 to 10 times more API calls.
When to use it: only when correctness is critical and cost is not the primary constraint. For a production system verifying financial calculations, self-consistency is worth the cost. For a chatbot answering general questions, it is not.
Pattern 4: Scratchpad CoT
Ask the model to write in a designated "scratchpad" section before giving the final answer. This separates the messy reasoning from the clean output.
Prompt:
Answer the following question. First, use a <scratchpad> to work through your reasoning. Then provide your final answer.
Question: Evaluate whether this code snippet has any bugs that could cause a runtime error in Python 3.10.
```python
def process_items(items):
result = []
for i in range(len(items)):
if items[i] > 0:
result.append(items[i] * 2)
return result[0]
**Output:**
The function has a bug: it will raise an IndexError on the line "return result[0]" when the input list is empty or contains only non-positive values. The function should either return result (the full list) or handle the empty case explicitly.
Scratchpad CoT keeps the output clean while still getting the benefit of visible reasoning. For user-facing applications where you do not want to show reasoning, use scratchpad tags and strip them before displaying.
## Pattern 5: Structured CoT
Force the reasoning into a specific structure that maps to your problem type.
**For decision analysis:**
Analyze the following decision using this exact structure:
SITUATION: [one sentence summary] OPTIONS CONSIDERED: [list the alternatives] KEY TRADEOFFS: [for each option, one pro and one con] RECOMMENDATION: [the choice with the primary reason] CONFIDENCE: [High/Medium/Low with one sentence explaining why]
Decision: Should a 4-person startup use a monorepo or a multi-repo structure?
Structured CoT is valuable when the output needs to be consistent across many inputs, because the structure serves as both a reasoning scaffold and an output format specification.
## Pattern 6: Stepback Prompting
Ask a broader background question before the specific question. This retrieves relevant foundational knowledge into context before answering the specific question.
**Without stepback:**
Prompt: "Why did the 2008 financial crisis cause such a sharp drop in consumer spending?"
**With stepback:**
Prompt: "First, briefly explain how consumer confidence and credit availability interact during financial downturns. Then, using that framework, explain why the 2008 financial crisis caused such a sharp drop in consumer spending."
The stepback question activates relevant background context in the model's attention, which it then draws on for the specific question. This is most useful for questions that require domain knowledge context.
## Pattern 7: Analogical CoT
Ask the model to solve a similar but simpler problem first, then apply the same reasoning to the actual problem.
**Prompt:**
First solve this simpler version: if you have 3 workers each completing 2 tasks per hour, how many tasks can they complete in 4 hours?
Now apply the same logic to solve this: if a distributed system has 12 processing nodes each handling 150 requests per second, and you need to process a burst of 500,000 requests, what is the minimum time to completion assuming perfect load balancing?
Analogical CoT transfers reasoning patterns from familiar to unfamiliar domains. It is particularly effective for abstract concepts where a concrete analog makes the reasoning structure visible.
## Pattern 8: Verification CoT
Ask the model to solve a problem and then verify its own answer using a different approach.
**Prompt:**
Solve the following problem, then verify your answer using a different method.
Problem: What is the probability of getting at least one head in three coin flips?
**Output with verification:**
Direct calculation: P(at least one head) = 1 - P(no heads). P(no heads in 3 flips) = (1/2)^3 = 1/8. P(at least one head) = 1 - 1/8 = 7/8.
Verification by enumeration: All outcomes with 3 flips: HHH, HHT, HTH, HTT, THH, THT, TTH, TTT. That is 8 outcomes. Outcomes with no heads: only TTT. Outcomes with at least one head: 7. Probability: 7/8.
Both methods give 7/8. The answer is confirmed: 7/8 or 87.5%.
Verification CoT dramatically reduces errors by catching mistakes that the first reasoning chain made. The cost is roughly double the tokens for tasks where it is applied.
## CoT vs. Self-Consistency vs. Tree of Thought
All three are reasoning enhancement techniques. Here is when to use each.
**Use CoT** for most reasoning tasks. It is cheap, reliable, and effective for problems with a clear solution path.
**Use self-consistency** when a single CoT chain produces inconsistent results across runs, or when the cost of an error is high enough to justify multiple generations.
**Use tree of thought** (Yao et al., "Tree of Thoughts: Deliberate Problem Solving with Large Language Models," NeurIPS 2023) for problems with large solution spaces where exploring multiple branches is necessary. Tree of thought generates multiple reasoning branches, evaluates each, and explores the most promising ones. It is significantly more expensive than CoT and is appropriate for complex planning, creative problems, or problems where early choices constrain later options.
The practical rule: start with CoT. If results are inconsistent, add self-consistency. If the problem is inherently branchy, consider tree of thought.
---
## Keep Reading
- [Prompt Engineering Complete Guide 2026](/blog/prompt-engineering-complete-guide-2026) — The full guide covering CoT alongside every other major prompting technique
- [How to Write a System Prompt That Actually Works: Examples for Every Use Case](/blog/system-prompt-guide-with-examples) — System prompts and CoT work together; here is how to combine them
- [Few-Shot Prompting: When It Works, When It Fails, With Real Examples](/blog/few-shot-prompting-guide) — The related technique that uses examples rather than step-by-step triggers
---
*Pristren builds AI-powered software for teams. [Zlyqor](https://app.zlyqor.com/signup) is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. [Try it free.](https://app.zlyqor.com/signup)*