Browser Agents: Automating Web Tasks With AI

Browser agents let LLMs control a real web browser to navigate, click, fill forms, and extract data. Here is how they work, when they are worth the cost, and when they are not.

Mahmudul Haque Qudrati

CEO & ML Engineer

May 18, 2026

7 min read

// tags

#browser-agents#web-automation#playwright#ai-agents#web-scraping

FIG. ART-24

7 min read

“

Browser Agents: Automating Web Tasks With AI

// reading plan

sections

938

words

min read

// AI Agents

Building reliable agentic AI systems: A Practical Overview

A practical guide to building reliable agentic AI systems covering structured outputs, observability, fallbacks, and cost controls with real code examples.

4 min read

// AI Agents

What is Harness engineering: Leveraging Codex in an agent-first world? A Practical Overview

Browser agents combine a large language model with a browser automation library, giving the LLM the ability to navigate websites, click elements, fill forms, and extract content. Unlike traditional web scraping (which requires brittle CSS selectors or XPath) or RPA (which requires recording specific interaction sequences), browser agents understand the intent behind a task and can adapt to UI variations in real time.

How Browser Agents Work

The core loop is simple: take a screenshot (or the page's accessibility tree), pass it to the LLM with the current task, receive an action (click, type, navigate, extract), execute the action in the browser, and repeat until the task is complete.

browser-use is the most widely adopted open-source Python library for this pattern. It integrates with Playwright and exposes a high-level agent interface:

import asyncio
from browser_use import Agent
from langchain_anthropic import ChatAnthropic

async def main():
    agent = Agent(
        task="Go to Hacker News and find the top 5 posts about AI agents today. Return the titles and URLs.",
        llm=ChatAnthropic(model="claude-opus-4-5"),
    )
    result = await agent.run()
    print(result)

asyncio.run(main())

browser-use handles the screenshot, action execution, and loop management. The LLM handles task understanding and action selection. Under the hood, Playwright controls a real Chromium browser.

For more control, you can use Playwright directly with an LLM:

from playwright.async_api import async_playwright
import anthropic
import base64

async def browser_agent(task: str):
    client = anthropic.Anthropic()

    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()

        for _ in range(20):  # max 20 steps
            screenshot = await page.screenshot()
            screenshot_b64 = base64.b64encode(screenshot).decode()

            response = client.messages.create(
                model="claude-opus-4-5",
                max_tokens=1024,
                messages=[{
                    "role": "user",
                    "content": [
                        {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": screenshot_b64}},
                        {"type": "text", "text": f"Task: {task}
What action should I take next? Reply with: CLICK x,y or TYPE text or NAVIGATE url or DONE result"}
                    ]
                }]
            )

            action = response.content[0].text
            if action.startswith("DONE"):
                return action.split("DONE ", 1)[1]
            elif action.startswith("NAVIGATE"):
                await page.goto(action.split("NAVIGATE ", 1)[1])
            elif action.startswith("CLICK"):
                coords = action.split("CLICK ", 1)[1].split(",")
                await page.mouse.click(int(coords[0]), int(coords[1]))
            elif action.startswith("TYPE"):
                await page.keyboard.type(action.split("TYPE ", 1)[1])

        await browser.close()

This is a simplified version, but it illustrates the core loop without framework abstraction.

Practical Use Cases

Competitive intelligence is one of the highest-value use cases. Monitoring competitor pricing, tracking product changes, collecting market data from sources with no public API. A browser agent can log in, navigate to the right page, extract the data, and format it for analysis.

Form filling at scale: submitting the same information to multiple portals (insurance quotes, vendor applications, government registrations). A browser agent handles the variation in form design that breaks scripted automation.

Web scraping when APIs do not exist: extracting structured data from websites that block programmatic requests or do not expose an API. Browser agents using real browsers are harder to detect than headless HTTP clients.

Testing web applications with natural language test cases: describe user flows in plain English and have the agent execute them, reporting failures. This is slower than Playwright scripts but requires no script maintenance when the UI changes.

Browser Agents: Automating Web Tasks With AI

Related Articles

Building reliable agentic AI systems: A Practical Overview

How Browser Agents Work

Practical Use Cases

Reliability Challenges

Cost Reality

When a Proper API Beats a Browser Agent

Keep Reading

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

What is Harness engineering: Leveraging Codex in an agent-first world? A Practical Overview

What Is Failing Grades Soar with AI Usage, Dwindling Math Skills in Berkeley CS Classes? A Practical Overview

Browser Agents: Automating Web Tasks With AI

Related Articles

Building reliable agentic AI systems: A Practical Overview

How Browser Agents Work

Practical Use Cases

Reliability Challenges

Cost Reality

When a Proper API Beats a Browser Agent

Keep Reading

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

What is Harness engineering: Leveraging Codex in an agent-first world? A Practical Overview

What Is Failing Grades Soar with AI Usage, Dwindling Math Skills in Berkeley CS Classes? A Practical Overview

The workspace your team
actually needs