Computer use agents interact with a computer the way a human does: they see a screenshot of the screen, decide what to click or type, execute the action, and see the result. Anthropic released computer use for Claude in October 2024, making this capability available to developers for the first time through a mainstream API. In 2026, the technology is more reliable but still early-stage.
What Computer Use Actually Is
When an LLM uses computer use, it receives a screenshot as input and outputs tool calls that correspond to mouse and keyboard actions: click(x, y), type("text"), key("ctrl+c"), scroll(direction), and screenshot() to request the next view of the screen.
The model must do several things simultaneously: understand the current state of the UI from a screenshot, decide what action to take next to move toward the goal, execute the action with precise coordinates, and evaluate the result. Each step can fail independently.
The Anthropic API exposes computer use as a set of tool definitions:
import anthropic
client = anthropic.Anthropic()
tools = [
{"type": "computer_20241022", "name": "computer", "display_width_px": 1920, "display_height_px": 1080},
{"type": "bash_20241022", "name": "bash"}
]
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=4096,
tools=tools,
messages=[{"role": "user", "content": "Open Chrome, go to github.com, and find the trending repositories for Python today."}]
)
The model responds with a series of tool calls. The developer's runtime executes each action using a library like PyAutoGUI, xdotool (Linux), or a browser automation framework, and returns the next screenshot.
What It Can Do Well
Automated testing is one of the strongest use cases. Instead of writing Selenium or Playwright scripts that break whenever the UI changes, you describe the test scenario in natural language and the agent executes it. The agent adapts to UI changes that would break a scripted test.
Data entry into legacy systems is another strong use case. Many organizations have internal tools with no API: old ERP systems, government portals, insurance platforms. An agent that can navigate these UIs like a human can automate tasks that were previously impossible to automate programmatically.
Legacy system integration extends this: extracting data from applications that have no export API, filling forms, copying information between systems. This is the "duct tape" automation that RPA (Robotic Process Automation) tools were built for, but with more flexibility.
Research tasks benefit from computer use when the information is not accessible via an API. Navigating a website that blocks scrapers, filling out a form to access data, or working with a desktop application that has no programmatic interface.
Where It Still Fails
Dynamic UIs cause the most failures. JavaScript-heavy single-page apps change their DOM state after load. Modals appear and disappear. Elements shift position as content loads. An agent that clicks based on a screenshot taken 500ms ago may click the wrong element because the UI changed.
Multi-step forms accumulate errors. Each step requires correct state from the previous step. An error in step 3 of a 10-step form corrupts the remaining steps. Without the ability to backtrack and correct, the agent fails the entire task.
CAPTCHAs are an explicit blocker. Services that detect automated access will interrupt the workflow at any point.
Coordinate precision is a persistent challenge. Clicking a small button or a link in dense text requires pixel-level accuracy that the model does not always achieve. When coordinates are off by 5 pixels, the click lands in the wrong place with no obvious error signal.
Computer Use vs RPA
Traditional RPA (UiPath, Automation Anywhere, Blue Prism) works by recording and replaying UI interactions using element identifiers rather than coordinates. This makes RPA more reliable for stable UIs: it finds the "Submit" button by its element ID, not its position on screen.
Computer use beats RPA when:
- The UI is highly variable or the RPA recording breaks frequently.
- The task requires understanding the content of the screen (reading a table, interpreting a result) to decide the next action.
- The task is too infrequent or complex to justify the cost of scripting an RPA workflow.
RPA beats computer use when:
- The UI is stable and well-understood.
- Reliability is paramount. RPA failure rates are lower for stable UIs.
- The task runs at high volume. RPA is faster and cheaper per execution.
When Computer Use Beats a Proper API Integration
The answer to "should I use computer use or build a proper API integration?" is almost always: build the API integration if one exists. An API is faster, more reliable, cheaper, and more maintainable than a computer use agent.
Computer use is the right answer when:
- There is no API.
- The API is too expensive, too slow to access, or requires authentication you cannot automate.
- The task is a one-off or low-frequency operation that does not justify the engineering cost of an integration.
- The system is actively hostile to API access (government portals, legacy enterprise software).
What to Watch in 2026 and Beyond
The reliability curve for computer use is improving. Better visual grounding (the model is more accurate at identifying element positions), better state tracking (the model remembers what happened in previous steps), and better error recovery (the model recognizes when a click failed and retries) are all active areas of improvement.
The practical ceiling for current models is tasks with fewer than 20 steps, stable UIs, and some tolerance for occasional errors. Tasks beyond this ceiling require human supervision.
Keep Reading
- Browser Agents: Automating Web Tasks With AI — computer use focused specifically on web browsers with open-source tooling
- Tool Use in LLMs: Design Patterns for Reliable Agent Actions — how the tool call architecture underlying computer use is designed
- Running AI Agents in Production — what breaks when computer use agents run in production
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.