The Anthropic API gives you access to Claude models via HTTP. The key concepts: messages use a structured human/assistant turn format with an optional system prompt, tool use requires a specific content block pattern, and prompt caching can reduce costs by 90% for repeated system prompts. This guide covers everything you need from first request to production optimization.
Authentication and Setup
Install the official SDK:
pnpm add @anthropic-ai/sdk
Initialize the client:
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
The SDK reads ANTHROPIC_API_KEY from environment automatically if you do not pass it explicitly. Never hardcode API keys.
Model Names
Current model identifiers (May 2026):
| Model | Identifier | Use Case |
|---|---|---|
| Claude 3.5 Sonnet | claude-3-5-sonnet-20241022 | Complex tasks, coding, analysis |
| Claude 3 Haiku | claude-3-haiku-20240307 | Fast, cheap, simple tasks |
| Claude 3 Opus | claude-3-opus-20240229 | Maximum capability (slower, expensive) |
Use Haiku for high-volume, lower-complexity tasks (classification, extraction, simple Q&A). Use Sonnet for the majority of production tasks. Reserve Opus for cases where maximum quality justifies the cost.
Basic Message Format
const response = await client.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
system: "You are a helpful assistant.",
messages: [
{ role: "user", content: "What is the capital of France?" }
],
});
const text = response.content[0].type === "text"
? response.content[0].text
: "";
Key parameters:
model: which Claude model to usemax_tokens: maximum output length (required, not optional)system: system prompt (optional, appears before the conversation)messages: array of alternating user/assistant turns
The content array in the response can contain multiple content blocks. For text-only responses there will be one block with type: "text".
Multi-Turn Conversations
const messages = [
{ role: "user", content: "What is React?" },
{ role: "assistant", content: "React is a JavaScript library..." },
{ role: "user", content: "How does it differ from Vue?" },
];
const response = await client.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
messages,
});
You maintain conversation history by including all previous turns in the messages array. There is no server-side session — you send the full history with each request.
Streaming Responses
const stream = client.messages.stream({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
messages: [{ role: "user", content: "Write a haiku about TypeScript." }],
});
for await (const event of stream) {
if (
event.type === "content_block_delta" &&
event.delta.type === "text_delta"
) {
process.stdout.write(event.delta.text);
}
}
const finalMessage = await stream.finalMessage();
Streaming returns events as text is generated. The content_block_delta events with text_delta deltas contain the streaming text. finalMessage() waits for completion and returns the complete response object.
Tool Use (Function Calling)
Tool use in the Anthropic API requires handling a multi-turn pattern manually. The model returns a tool_use content block, you execute the tool, then send the result back as a tool_result user message.
const tools = [
{
name: "get_weather",
description: "Get the current weather for a location",
input_schema: {
type: "object" as const,
properties: {
location: { type: "string", description: "City name" },
},
required: ["location"],
},
},
];
// First request
const response = await client.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
tools,
messages: [{ role: "user", content: "What is the weather in Tokyo?" }],
});
// Check if model wants to use a tool
if (response.stop_reason === "tool_use") {
const toolUseBlock = response.content.find((b) => b.type === "tool_use");
if (toolUseBlock && toolUseBlock.type === "tool_use") {
const toolResult = await executeWeatherTool(toolUseBlock.input);
// Second request with tool result
const finalResponse = await client.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
tools,
messages: [
{ role: "user", content: "What is the weather in Tokyo?" },
{ role: "assistant", content: response.content },
{
role: "user",
content: [
{
type: "tool_result",
tool_use_id: toolUseBlock.id,
content: JSON.stringify(toolResult),
},
],
},
],
});
}
}
Prompt Caching: 90% Discount on Repeated Content
Prompt caching is the most impactful cost optimization available on the Anthropic API. When you mark content with cache_control: { type: "ephemeral" }, Anthropic caches that content for 5 minutes. Subsequent requests that include the same cached content are charged at 10% of the normal input token price.
This is valuable when your system prompt is large and stable — detailed instructions, tool definitions, reference documentation loaded into the system prompt.
const response = await client.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
system: [
{
type: "text",
text: "You are an expert coding assistant. Here is the full API documentation:
" +
longDocumentationString, // potentially thousands of tokens
cache_control: { type: "ephemeral" },
},
],
messages: [{ role: "user", content: userQuestion }],
});
The first request caches the system prompt (charged at full price). Subsequent requests within 5 minutes that include the same system prompt content are charged 10% for those tokens. For applications where the system prompt is large and requests are frequent, this can reduce costs by 60-80%.
Cached tokens appear in the response usage object as cache_read_input_tokens and cache_creation_input_tokens.
Token Counting
Count tokens before sending to avoid exceeding limits or to budget requests:
const { input_tokens } = await client.messages.countTokens({
model: "claude-3-5-sonnet-20241022",
system: "You are a helpful assistant.",
messages: [{ role: "user", content: "Hello" }],
});
console.log(`This request will use ${input_tokens} input tokens`);
Batch Messages API: 50% Discount
The Message Batches API processes requests asynchronously with a 50% price discount and a 24-hour turnaround. Use this for offline processing tasks where immediate response is not required — document processing, bulk analysis, dataset generation.
const batch = await client.messages.batches.create({
requests: documents.map((doc, i) => ({
custom_id: `doc-${i}`,
params: {
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
messages: [{ role: "user", content: `Summarize: ${doc}` }],
},
})),
});
// Poll for completion
let completed = false;
while (!completed) {
const status = await client.messages.batches.retrieve(batch.id);
if (status.processing_status === "ended") {
completed = true;
}
await new Promise((r) => setTimeout(r, 30000)); // check every 30s
}
Error Handling
import Anthropic from "@anthropic-ai/sdk";
try {
const response = await client.messages.create({ /* ... */ });
} catch (error) {
if (error instanceof Anthropic.APIError) {
switch (error.status) {
case 400:
// Validation error — malformed request, invalid parameters
console.error("Bad request:", error.message);
break;
case 429:
// Rate limit — back off and retry
const retryAfter = error.headers?.["retry-after"];
await new Promise((r) => setTimeout(r, Number(retryAfter) * 1000));
break;
case 529:
// API overloaded — retry with exponential backoff
await new Promise((r) => setTimeout(r, 5000));
break;
case 500:
// Server error — retry
break;
}
}
}
Cost Optimization Strategy
- Use Haiku for high-volume simple tasks (classification, short Q&A, extraction)
- Use Sonnet for complex tasks (coding, analysis, long-form generation)
- Enable prompt caching when your system prompt exceeds 1,000 tokens and requests are frequent
- Use the Batch API for non-time-sensitive bulk processing
- Set
max_tokensas tightly as practical to avoid paying for tokens you do not need
Keep Reading
- Vercel AI SDK Guide — higher-level wrapper that simplifies Anthropic API integration
- Cutting LLM API Costs — full cost optimization guide including caching and batching
- Claude 3.5 Sonnet Review — when to use Claude vs other models
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.