Anthropic API Guide: Claude Integration From Authentication to Prompt Caching
Complete guide to the Anthropic API - authentication, message format, streaming, tool use, prompt caching for 90% cost reduction, batch processing, and production error handling.
The Anthropic API gives you access to Claude models via HTTP. The key concepts: messages use a structured human/assistant turn format with an optional system prompt, tool use requires a specific content block pattern, and prompt caching can reduce costs by 90% for repeated system prompts. This guide covers everything you need from first request to production optimization.
Authentication and Setup
Install the official SDK:
pnpm add @anthropic-ai/sdk
Initialize the client:
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
The SDK reads ANTHROPIC_API_KEY from environment automatically if you do not pass it explicitly. Never hardcode API keys.
Model Names
Current model identifiers (May 2026):
Model
Identifier
Use Case
Claude 3.5 Sonnet
claude-3-5-sonnet-20241022
Complex tasks, coding, analysis
Claude 3 Haiku
claude-3-haiku-20240307
Fast, cheap, simple tasks
Claude 3 Opus
claude-3-opus-20240229
Maximum capability (slower, expensive)
Use Haiku for high-volume, lower-complexity tasks (classification, extraction, simple Q&A). Use Sonnet for the majority of production tasks. Reserve Opus for cases where maximum quality justifies the cost.
Team workspace
Ship faster with chat, meetings, and projects in one place — Zlyqor.
const response = await client.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
system: "You are a helpful assistant.",
messages: [
{ role: "user", content: "What is the capital of France?" }
],
});
const text = response.content[0].type === "text"
? response.content[0].text
: "";
Key parameters:
model: which Claude model to use
max_tokens: maximum output length (required, not optional)
system: system prompt (optional, appears before the conversation)
messages: array of alternating user/assistant turns
The content array in the response can contain multiple content blocks. For text-only responses there will be one block with type: "text".
Multi-Turn Conversations
const messages = [
{ role: "user", content: "What is React?" },
{ role: "assistant", content: "React is a JavaScript library..." },
{ role: "user", content: "How does it differ from Vue?" },
];
const response = await client.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
messages,
});
You maintain conversation history by including all previous turns in the messages array. There is no server-side session - you send the full history with each request.
Streaming Responses
const stream = client.messages.stream({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
messages: [{ role: "user", content: "Write a haiku about TypeScript." }],
});
for await (const event of stream) {
if (
event.type === "content_block_delta" &&
event.delta.type === "text_delta"
) {
process.stdout.write(event.delta.text);
}
}
const finalMessage = await stream.finalMessage();
Streaming returns events as text is generated. The content_block_delta events with text_delta deltas contain the streaming text. finalMessage() waits for completion and returns the complete response object.
Tool Use (Function Calling)
Tool use in the Anthropic API requires handling a multi-turn pattern manually. The model returns a tool_use content block, you execute the tool, then send the result back as a tool_result user message.
const tools = [
{
name: "get_weather",
description: "Get the current weather for a location",
input_schema: {
type: "object" as const,
properties: {
location: { type: "string", description: "City name" },
},
required: ["location"],
},
},
];
// First request
const response = await client.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
tools,
messages: [{ role: "user", content: "What is the weather in Tokyo?" }],
});
// Check if model wants to use a tool
if (response.stop_reason === "tool_use") {
const toolUseBlock = response.content.find((b) => b.type === "tool_use");
if (toolUseBlock && toolUseBlock.type === "tool_use") {
const toolResult = await executeWeatherTool(toolUseBlock.input);
// Second request with tool result
const finalResponse = await client.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
tools,
messages: [
{ role: "user", content: "What is the weather in Tokyo?" },
{ role: "assistant", content: response.content },
{
role: "user",
content: [
{
type: "tool_result",
tool_use_id: toolUseBlock.id,
content: JSON.stringify(toolResult),
},
],
},
],
});
}
}
Prompt Caching: 90% Discount on Repeated Content
Prompt caching is the most impactful cost optimization available on the Anthropic API. When you mark content with cache_control: { type: "ephemeral" }, Anthropic caches that content for 5 minutes. Subsequent requests that include the same cached content are charged at 10% of the normal input token price.
This is valuable when your system prompt is large and stable - detailed instructions, tool definitions, reference documentation loaded into the system prompt.
const response = await client.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
system: [
{
type: "text",
text: "You are an expert coding assistant. Here is the full API documentation:
" +
longDocumentationString, // potentially thousands of tokens
cache_control: { type: "ephemeral" },
},
],
messages: [{ role: "user", content: userQuestion }],
});
The first request caches the system prompt (charged at full price). Subsequent requests within 5 minutes that include the same system prompt content are charged 10% for those tokens. For applications where the system prompt is large and requests are frequent, this can reduce costs by 60-80%.
Cached tokens appear in the response usage object as cache_read_input_tokens and cache_creation_input_tokens.
Token Counting
Count tokens before sending to avoid exceeding limits or to budget requests:
const { input_tokens } = await client.messages.countTokens({
model: "claude-3-5-sonnet-20241022",
system: "You are a helpful assistant.",
messages: [{ role: "user", content: "Hello" }],
});
console.log(`This request will use ${input_tokens} input tokens`);
Batch Messages API: 50% Discount
The Message Batches API processes requests asynchronously with a 50% price discount and a 24-hour turnaround. Use this for offline processing tasks where immediate response is not required - document processing, bulk analysis, dataset generation.
const batch = await client.messages.batches.create({
requests: documents.map((doc, i) => ({
custom_id: `doc-${i}`,
params: {
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
messages: [{ role: "user", content: `Summarize: ${doc}` }],
},
})),
});
// Poll for completion
let completed = false;
while (!completed) {
const status = await client.messages.batches.retrieve(batch.id);
if (status.processing_status === "ended") {
completed = true;
}
await new Promise((r) => setTimeout(r, 30000)); // check every 30s
}
Error Handling
import Anthropic from "@anthropic-ai/sdk";
try {
const response = await client.messages.create({ /* ... */ });
} catch (error) {
if (error instanceof Anthropic.APIError) {
switch (error.status) {
case 400:
// Validation error - malformed request, invalid parameters
console.error("Bad request:", error.message);
break;
case 429:
// Rate limit - back off and retry
const retryAfter = error.headers?.["retry-after"];
await new Promise((r) => setTimeout(r, Number(retryAfter) * 1000));
break;
case 529:
// API overloaded - retry with exponential backoff
await new Promise((r) => setTimeout(r, 5000));
break;
case 500:
// Server error - retry
break;
}
}
}
Cost Optimization Strategy
Use Haiku for high-volume simple tasks (classification, short Q&A, extraction)
Use Sonnet for complex tasks (coding, analysis, long-form generation)
Enable prompt caching when your system prompt exceeds 1,000 tokens and requests are frequent
Use the Batch API for non-time-sensitive bulk processing
Set max_tokens as tightly as practical to avoid paying for tokens you do not need
Keep Reading
Vercel AI SDK Guide - higher-level wrapper that simplifies Anthropic API integration
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace - chat, projects, time tracking, AI meeting summaries, and invoicing - in one tool. Try it free.
Practical deep-dives on LLMs, developer tools, and AI engineering. No filler. Unsubscribe any time.
// written byFIG. AUTH-01
530
Mahmudul Haque Qudrati
CEO & ML Engineer
CEO and ML Engineer at Pristren. Builds AI-powered software for teams and writes about machine learning, LLMs, developer tools, and practical AI applications.
Claude 3.5 Sonnet Review: What It Does Better Than GPT-4o (and Where It Falls Short)
An honest, benchmark-driven comparison of Claude 3.5 Sonnet vs GPT-4o covering coding, document analysis, multimodal tasks, pricing, and real-world verdict.