The Gap Between Reasoning and Acting
Chain-of-thought prompting teaches LLMs to reason through problems step-by-step, but reasoning alone cannot access external information or take actions in the world. Action-only approaches (API calls, web search) lack the reasoning chain to handle multi-step problems that require planning. The ReAct paper (arXiv:2210.03629) by Yao et al. proposed combining both.
The Thought-Action-Observation Loop
ReAct follows a strict interleaved pattern:
- Thought: The model reasons about what it knows and what it needs to do next
- Action: The model calls a specific tool with arguments
- Observation: The tool returns a result
- Repeat until the answer is reached or the reasoning concludes
Example trace for "What is the elevation of the highest mountain in the country that won the 2022 FIFA World Cup?":
Thought: I need to find which country won the 2022 FIFA World Cup.
Action: Search[2022 FIFA World Cup winner]
Observation: Argentina won the 2022 FIFA World Cup, defeating France.
Thought: The highest mountain in Argentina is Aconcagua. I need its elevation.
Action: Search[Aconcagua elevation]
Observation: Aconcagua is 6,961 meters (22,838 feet) above sea level.
Thought: I have all the information needed.
Action: Finish[6,961 meters]
Why Pure CoT Fails for Tool Use
CoT without tools confabulates — it invents facts it doesn't know. A pure CoT chain might answer the question above by hallucinating that France won or guessing an incorrect elevation. Without grounding in external sources, even correct reasoning on incorrect premises produces wrong answers.
Why Pure Action Fails
Pure action sequences (just calling APIs in sequence) lack the reasoning to decide which action to take next, how to interpret conflicting observations, or when to stop. The Thought steps provide the planning layer that makes action selection coherent.
from langchain.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain import hub
from langchain.tools import DuckDuckGoSearchRun
llm = ChatOpenAI(model="gpt-4", temperature=0)
tools = [DuckDuckGoSearchRun()]
# Use the standard ReAct prompt template from LangChain hub
prompt = hub.pull("hwchase17/react")
agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True, # Shows Thought/Action/Observation traces
max_iterations=10,
)
result = agent_executor.invoke({
"input": "What is the population of the capital of the country that hosts CERN?"
})
Benchmark Results
On HotpotQA (multi-hop reasoning), ReAct with Wikipedia search reduced hallucination rate by 34% compared to CoT alone. On WebShop (web shopping agent), ReAct outperformed action-only baselines by 10% in success rate. On ALFWorld (interactive household tasks), ReAct achieved 71% success versus 45% for action-only agents.
Modern Variants
Reflexion adds a self-reflection step after failures: the agent critiques its failed attempt and stores the reflection in memory for future episodes. REWOO separates planning from execution, generating the full plan before making any tool calls. Toolformer internalizes tool use into the model weights rather than relying on prompting.