Why Vanilla LangChain Agents Fail in Production
LangChain's original AgentExecutor is a black box. The loop logic is hidden, error handling is inconsistent, and there is no first-class way to add cycles (revisit a step based on new information) or branch (take different paths based on tool output). Production agents built this way tend to hallucinate tool calls, loop infinitely, or silently drop errors.
LangGraph solves this by modelling agents as explicit state machines: directed graphs where nodes are functions and edges are conditional transitions. You see exactly what happens, in what order, and why.
Core Concepts
- StateGraph: the graph object; you add nodes and edges to it
- State: a typed dict passed between all nodes; each node returns a partial update
- Conditional edges: functions that inspect state and return the name of the next node
- Checkpointers: persist state between steps for long-running or resumable workflows
Building a Research Agent
pip install langgraph langchain-openai
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage
import operator
class AgentState(TypedDict):
messages: Annotated[list, operator.add]
search_results: str
final_answer: str
llm = ChatOpenAI(model="gpt-4o-mini")
def search_node(state: AgentState) -> dict:
# Simulate web search tool call
query = state["messages"][-1].content
results = f"Search results for: {query} — [simulated]"
return {"search_results": results}
def synthesize_node(state: AgentState) -> dict:
context = state["search_results"]
prompt = f"Based on: {context}
Answer the original question concisely."
response = llm.invoke([HumanMessage(content=prompt)])
return {"final_answer": response.content, "messages": [response]}
def router(state: AgentState) -> str:
if state.get("search_results"):
return "synthesize"
return "search"
graph = StateGraph(AgentState)
graph.add_node("search", search_node)
graph.add_node("synthesize", synthesize_node)
graph.set_entry_point("search")
graph.add_conditional_edges("search", router, {"synthesize": "synthesize"})
graph.add_edge("synthesize", END)
app = graph.compile()
result = app.invoke({"messages": [HumanMessage(content="What is PagedAttention?")]})
print(result["final_answer"])
Persistence With Checkpoints
Add a checkpointer to resume long-running agents across process restarts:
from langgraph.checkpoint.sqlite import SqliteSaver
memory = SqliteSaver.from_conn_string(":memory:")
app = graph.compile(checkpointer=memory)
config = {"configurable": {"thread_id": "session-123"}}
result = app.invoke({"messages": [HumanMessage(content="Start research")]}, config)
Each thread_id gets its own isolated state history.
Streaming Intermediate Steps
for event in app.stream({"messages": [HumanMessage(content="Explain HNSW")]}, config):
for key, value in event.items():
print(f"Node: {key} → {value}")
This is critical for user-facing apps — stream partial results to keep the UI responsive.
LangSmith Integration
Set LANGCHAIN_TRACING_V2=true and LANGCHAIN_API_KEY to get full traces in LangSmith. Every node input/output, tool call, and LLM response is recorded with latency and token counts.
Full tutorials are at langgraph tutorials.