Browser-Use is an open-source Python framework that lets AI agents control web browsers autonomously β clicking, typing, scrolling, filling forms, and navigating multi-step workflows from natural language instructions. With 93k+ GitHub stars, itβs the most popular AI browser automation tool. It works with Claude, GPT, Gemini, and local models, and ships with a Claude Code skill for direct integration.
| *Source: GitHub β browser-use/browser-use | browser-use.com | Benchmark* |
What It Does
Give a browser agent a task in plain English, and it figures out how to complete it:
from browser_use import Agent, Browser, ChatBrowserUse
import asyncio
async def main():
agent = Agent(
task="Go to Amazon, search for 'wireless headphones under $50', "
"sort by best reviews, and save the top 3 results to a file",
llm=ChatBrowserUse(),
browser=Browser(),
)
await agent.run()
asyncio.run(main())
The agent processes the page (via DOM/HTML extraction, plus optional vision screenshots), reasons about what to do, takes actions, and repeats until the task is done.
How a Normal User Would Use This
Scenario 1: Job Application Automation
You have a resume and want to apply to 20 jobs on LinkedIn:
agent = Agent(
task="Go to LinkedIn Jobs, search for 'Python developer remote', "
"apply to the first 20 jobs using Easy Apply. "
"Use my resume details: [name, email, experience]",
llm=ChatBrowserUse(),
)
Scenario 2: Price Comparison Shopping
agent = Agent(
task="Compare prices for 'Sony WH-1000XM5' across Amazon, "
"Best Buy, and Walmart. Create a markdown table with "
"price, shipping, and availability for each store.",
llm=ChatBrowserUse(),
)
Scenario 3: Research Data Collection
agent = Agent(
task="Go to Google Scholar, search for 'transformer attention efficiency 2026', "
"collect the titles, authors, and citation counts of the top 20 results, "
"and save to research_papers.csv",
llm=ChatBrowserUse(),
)
Scenario 4: CLI Mode (No Code Needed)
# Interactive browser control from your terminal
browser-use open https://example.com
browser-use state # List all clickable elements
browser-use click 5 # Click element #5
browser-use type "search query"
browser-use screenshot result.png
browser-use close
Supported Models
| Model | Speed | Cost | Best For |
|---|---|---|---|
| ChatBrowserUse (recommended) | 3-5x fastest | $0.20/$2.00 per M tokens | Best accuracy, purpose-built |
| GPT-4o | Fast | Standard OpenAI pricing | General use |
| Claude Sonnet/Opus | Fast | Standard Anthropic pricing | Complex reasoning tasks |
| Gemini 2.0 Flash | Very fast | Google pricing | Speed-sensitive tasks |
| Ollama (local) | Varies | Free | Privacy-sensitive work |
Open-Source vs Cloud
| Feature | Open-Source (Self-hosted) | Cloud Agent |
|---|---|---|
| Cost | Free + model API costs | Subscription |
| CAPTCHA solving | Manual/third-party | Built-in |
| Stealth/anti-detection | Basic | Advanced fingerprinting |
| Proxy rotation | Configure yourself | Built-in |
| Persistent memory | Custom implementation | Built-in |
| Custom tools | Full control | 1000+ integrations |
Claude Code Integration
Install the Browser-Use skill directly:
mkdir -p ~/.claude/skills/browser-use
curl -o ~/.claude/skills/browser-use/SKILL.md \
https://raw.githubusercontent.com/browser-use/browser-use/main/skills/browser-use/SKILL.md
Now Claude Code can automate browser tasks as part of your coding workflow.
Architecture
Natural Language Task
β
ββββββββββββββββββββ
β LLM (reasoning) β Decides what to do next
ββββββββββ¬ββββββββββ
β
ββββββββββββββββββββ
β Action Pipeline β Translates to browser commands
ββββββββββ¬ββββββββββ
β
ββββββββββββββββββββ
β Chromium (via CDP) β Executes: click, type, scroll, screenshot
ββββββββββ¬ββββββββββ
β
ββββββββββββββββββββ
β Context Manager β Extracts DOM + screenshots for next decision
ββββββββββββββββββββ
β
Loop until done
Real-World Use Cases
- Form automation β Job applications, survey completion, account registration across any website.
- E-commerce β Grocery shopping lists, price comparison, checkout automation.
- Research β Web scraping, competitor monitoring, data collection from sites without APIs.
- Administrative β Appointment scheduling, document downloads, account maintenance.
- Testing β QA workflows, multi-step user journey testing, accessibility auditing.
- Social media β Content posting, engagement tracking, analytics collection.
Note: Some use cases (account registration, checkout automation, job applications at scale) may conflict with website Terms of Service. Always verify you have permission before automating interactions with third-party sites.
How LearnAI Team Could Use This
- Web automation lab β Students build browser agents for real tasks (price comparison, data collection) and learn about AI tool use, error recovery, and task decomposition.
- Agent evaluation project β Compare Browser-Use across different LLM backends on the same 100-task benchmark β teaching model selection and cost-performance tradeoffs.
- Ethics discussion β When should AI automate web interactions? CAPTCHA solving, anti-detection, and Terms of Service implications.
Links
- GitHub: browser-use/browser-use (93k+ stars)
- Website: browser-use.com
- Benchmark: browser-use/benchmark
- Claude Code Skill: Available via GitHub raw URL