SWE-Bench: The Coding Benchmark That Matters
SWE-Bench Verified tests whether a model can resolve real GitHub issues in popular open-source Python repositories. It's harder than HumanEval because the model must understand a large existing codebase, identify the relevant files, write a patch, and pass the repository's test suite.
Claude 3.5 Sonnet scores 49% on SWE-Bench Verified — compared to GPT-4o at 38%. That 11-point gap translates to meaningfully fewer iterations when debugging real software.
Model Specs
- Context window: 200,000 tokens (roughly 500 pages of text)
- Pricing: $3.00 per million input tokens, $15.00 per million output tokens
- Extended thinking: Available via API for complex multi-step reasoning
- Computer use: Beta feature for browser/desktop automation
- Tool use: Native function calling with parallel tool execution
Calling the API With Python
Install the Anthropic Python SDK:
pip install anthropic
import anthropic
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from env
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[
{
"role": "user",
"content": "Review this Python function for bugs and suggest improvements:
def process(data):
result = []
for i in data:
result.append(i * 2)
return result"
}
]
)
print(message.content[0].text)
Extended Thinking Mode
For hard algorithmic or architectural problems, enable extended thinking to let the model reason through the problem before answering:
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000
},
messages=[{"role": "user", "content": "Design a rate limiter for a distributed API."}]
)
for block in response.content:
if block.type == "thinking":
print("Reasoning:", block.thinking[:200], "...")
elif block.type == "text":
print("Answer:", block.text)
Tool Use (Function Calling)
Claude supports parallel tool calls, which is useful for agentic pipelines that need to fetch multiple data sources simultaneously:
tools = [
{
"name": "read_file",
"description": "Read a file from the repository",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "File path"}
},
"required": ["path"]
}
}
]
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=4096,
tools=tools,
messages=[{"role": "user", "content": "What does main.py do?"}]
)
Computer Use (Beta)
The computer use feature lets Claude control a browser or desktop to complete tasks autonomously. It's currently in beta and best suited for structured, well-defined automation flows. See the Anthropic docs for the full setup guide.
Summary
Claude 3.5 Sonnet is the strongest model for coding tasks as of early 2026. Its 200k context window, leading SWE-Bench score, and extended thinking mode make it particularly powerful for large refactors, code review, and agentic workflows. Explore the full model lineup at anthropic.com/claude/sonnet.