Table of Contents
- Introduction
- What is browser-use?
- Synthesizing Hacker News Themes
- Working with the Output
- What This Run Cost
- A Second Experiment: Scraping Newegg
- Trade-offs
- When to Use Each Tool
- Conclusion
Introduction
Traditional browser automation tools like Playwright require you to write CSS selectors for every element you want to extract:
# Extract a laptop's price from a product card
price = await card.locator("h4.price").text_content()
This approach works, but it tightly couples your scraper to the site’s HTML structure. If a class name changes, your scraper breaks. You then have to inspect the updated HTML and rewrite your selectors from scratch.
What if you could just describe what you want in plain English?
# Tell an AI agent what to extract
agent = Agent(
task="Find gaming laptops under $1500 and extract the name, price, and GPU",
llm=ChatOpenAI(model="gpt-4o"),
)
That is what browser-use does. Instead of describing the steps, you describe the goal, and an LLM works out the steps for you.
💻 Get the Code: The complete source code and Jupyter notebook for this tutorial are available on GitHub. Clone it to follow along!
Stay Current with CodeCut
Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.
What is browser-use?
browser-use is a Python library that gives an LLM a working browser. Under the hood it uses Playwright to drive the browser, but the LLM reads each page and decides what to click, type, and extract. You write the task in plain English, and the agent figures out the rest.
To install browser-use:
pip install browser-use
playwright install chromium
This article uses browser-use v0.12.5.
Since this tutorial uses OpenAI’s GPT-4o as the agent’s model, you will need an OpenAI API key. Store it in a .env file:
OPENAI_API_KEY=your-key-here
Then load it with python-dotenv:
import nest_asyncio
nest_asyncio.apply()
from dotenv import load_dotenv
load_dotenv()
Synthesizing Hacker News Themes
We will point browser-use at Hacker News and ask it to:
Find the top AI-related stories on the front page and identify the common themes.
This task is a good fit for browser-use because it mixes extraction with judgment: the agent has to classify each story and then reason across all of them to pull out themes.
Let’s set it up. First, define the output schema using Pydantic. This tells browser-use what structure to return:
import asyncio
from pydantic import BaseModel
from browser_use import Agent, ChatOpenAI
class HNResults(BaseModel):
titles: list[str]
points: list[int]
comments: list[int]
urls: list[str]
themes: list[str]
summary: str
Now define the agent. It takes four parameters:
task: the natural language instructions for what you want the agent to do.llm: the model that drives the agent. Here we use GPT-4o.output_model_schema: the Pydantic schema that tells the agent how to structure its final result.calculate_cost: when set toTrue, browser-use tracks token usage and dollar cost so you can inspect them later.
async def find_ai_stories():
# Configure the agent with task, model, schema, and cost tracking
agent = Agent(
task=(
"Go to https://news.ycombinator.com/ and find "
"all stories on the front page that are "
"about AI, LLMs, or AI agents. "
"For each story, extract the title, points, "
"comment count, and URL. "
"Then identify 2-3 common themes across these "
"stories and write a short summary of what the "
"Hacker News community is currently excited or "
"concerned about regarding AI."
),
llm=ChatOpenAI(model="gpt-4o"),
output_model_schema=HNResults,
calculate_cost=True,
)
# Run the agent and get the structured result
history = await agent.run()
result = history.final_result()
# Parse into HNResults, falling back to an empty result if nothing came back
parsed = HNResults.model_validate_json(result) if result else HNResults(
titles=[], points=[], comments=[], urls=[], themes=[], summary="",
)
return parsed, history
Run the agent:
results, history = asyncio.run(find_ai_stories())
print(f"Found {len(results.titles)} AI-related stories")
📍 Step 1:
👍 Eval: Successfully navigated to the Hacker News front page
and identified several stories that may relate to AI.
🧠 Memory: On the Hacker News front page. Identified potential
AI-related stories by their titles. Need to extract
details for analysis.
🎯 Next goal: Extract details from the identified AI-related
stories on the front page.
▶️ extract: query: AI|LLM|AI agent, extract_links: True
📍 Step 2:
👍 Eval: Successfully extracted details of AI-related stories
from Hacker News.
🧠 Memory: Extracted details of AI-related stories. Ready to
analyze for common themes and summarize.
🎯 Next goal: Analyze the extracted stories to identify 2-3
common themes and write a summary.
▶️ done: 8 stories extracted, 3 themes identified
✅ Task completed successfully
⚠️ Agent reported success but judge thinks task failed
⚖️ Judge Verdict: ❌ FAIL
Failure Reason: The agent included a non-AI related story
('Solod – A subset of Go that translates to C') in its results.
Found 8 AI-related stories
Each step in the log shows the agent’s internal reasoning loop. There are four fields, and they work together to drive the next action:
- Eval checks whether the last action worked. This makes the agent self-correcting, so failures get retried instead of silently propagating.
- Memory tracks what has been done so far. This stops the agent from repeating expensive actions and is what makes multi-step tasks possible.
- Next goal plans the next action based on eval and memory. The LLM decides what to do next, so you do not have to write a state machine.
- Action executes the plan from a built-in toolkit of
navigate,click,extract, andscroll. The agent picks the right tool on its own, so the same code works on a new site.
%%{init: {"theme": "dark"}}%%
flowchart TD
Task([Task prompt]) --> Eval
Eval[Eval: check last action] --> Memory[Memory: track progress]
Memory --> NextGoal[Next goal: plan next step]
NextGoal --> Action[Action: execute from toolkit]
Action -->|Not done| Eval
Action -->|Done| Result([Worker result])Once the worker declares success, a second LLM steps in as a judge.
This is a clever setup. Workers rarely catch their own mistakes without an independent perspective. Since the judge only sees the prompt and final result, it can objectively evaluate the output.
Here, it correctly identified that Solod is not an AI story.
%%{init: {"theme": "dark"}}%%
flowchart TD
Result([Worker result]) --> Review[Judge: review against original task]
Review --> Verdict{Pass or fail?}
Verdict -->|Pass| Output([Trusted output])
Verdict -->|Fail| Retry([Retry or alert])Let’s look at the results:
for title, points, comments, url in zip(
results.titles, results.points, results.comments, results.urls
):
print(f" {title}")
print(f" {points} points | {comments} comments")
print(f" {url}")
print()
print("Themes:")
for theme in results.themes:
print(f" - {theme}")
print(f"\nSummary: {results.summary}")
Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS
363 points | 170 comments
https://github.com/matthartman/ghost-pepper
Solod – A subset of Go that translates to C
122 points | 30 comments
https://github.com/solod-dev/solod
Issue: Claude Code is unusable for complex engineering tasks with Feb updates
1047 points | 587 comments
https://github.com/anthropics/claude-code/issues/42796
Sam Altman may control our future – can he be trusted?
1340 points | 542 comments
https://www.newyorker.com/magazine/2026/04/13/sam-altman-may-control-our-future-can-he-be-trusted
Launch HN: Freestyle – Sandboxes for Coding Agents
265 points | 145 comments
https://www.freestyle.sh/
A cryptography engineer's perspective on quantum computing timelines
455 points | 186 comments
https://words.filippo.io/crqc-timeline/
AI singer now occupies eleven spots on iTunes singles chart
166 points | 257 comments
https://www.showbiz411.com/2026/04/05/itunes-takeover-by-fake-ai-singer...
Show HN: Hippo, biologically inspired memory for AI agents
90 points | 17 comments
https://github.com/kitfunso/hippo-memory
Themes:
- AI in media and entertainment
- Ethical concerns about AI leadership
- Technical challenges in AI development
Summary: The Hacker News community is currently excited about advancements in AI tools like speech-to-text applications and biologically inspired memory systems. There are also significant discussions around ethical concerns regarding influential figures in AI like Sam Altman. Additionally, there are technical challenges being highlighted in the development of complex AI systems.
Most of the output is solid:
- Structured fields are accurate. Titles, points, comments, and URLs are correct for every row.
- Themes map to real stories. “AI in media and entertainment” lines up with the AI singer post, “Ethical concerns about AI leadership” with the Sam Altman piece, and “Technical challenges in AI development” with the Claude Code issue.
- The summary reads like a take, not a list. Instead of restating each story, it picks out what the community is excited about (new AI tools), worried about (Sam Altman), and struggling with (complex AI systems).
However, the classifications are wrong in two places. Solod (a Go-to-C transpiler) and the quantum computing post both made the list even though neither is about AI.
Working with the Output
Once you have the structured output, you can do something with it. The simplest first step is to load it into a pandas DataFrame:
import pandas as pd
df = pd.DataFrame({
"title": results.titles,
"points": results.points,
"comments": results.comments,
"url": results.urls,
})
df
| title | points | comments | url | |
|---|---|---|---|---|
| 0 | Show HN: Ghost Pepper – Local hold-to-talk spe… | 363 | 170 | https://github.com/matthartman/ghost-pepper |
| 1 | Solod – A subset of Go that translates to C | 122 | 30 | https://github.com/solod-dev/solod |
| 2 | Issue: Claude Code is unusable for complex eng… | 1047 | 587 | https://github.com/anthropics/claude-code/issu… |
| 3 | Sam Altman may control our future – can he be … | 1340 | 542 | https://www.newyorker.com/magazine/2026/04/13/… |
| 4 | Launch HN: Freestyle – Sandboxes for Coding Ag… | 265 | 145 | https://www.freestyle.sh/ |
| 5 | A cryptography engineer’s perspective on quant… | 455 | 186 | https://words.filippo.io/crqc-timeline/ |
| 6 | AI singer now occupies eleven spots on iTunes … | 166 | 257 | https://www.showbiz411.com/2026/04/05/itunes-t… |
| 7 | Show HN: Hippo, biologically inspired memory f… | 90 | 17 | https://github.com/kitfunso/hippo-memory |
Now you can drop the misclassified rows the judge flagged earlier and keep only the true AI stories:
misclassified_rows = [1, 5]
ai_only = df.drop(misclassified_rows).reset_index(drop=True)
ai_only
| title | points | comments | url | |
|---|---|---|---|---|
| 0 | Show HN: Ghost Pepper – Local hold-to-talk spe… | 363 | 170 | https://github.com/matthartman/ghost-pepper |
| 1 | Issue: Claude Code is unusable for complex eng… | 1047 | 587 | https://github.com/anthropics/claude-code/issu… |
| 2 | Sam Altman may control our future – can he be … | 1340 | 542 | https://www.newyorker.com/magazine/2026/04/13/… |
| 3 | Launch HN: Freestyle – Sandboxes for Coding Ag… | 265 | 145 | https://www.freestyle.sh/ |
| 4 | AI singer now occupies eleven spots on iTunes … | 166 | 257 | https://www.showbiz411.com/2026/04/05/itunes-t… |
| 5 | Show HN: Hippo, biologically inspired memory f… | 90 | 17 | https://github.com/kitfunso/hippo-memory |
What This Run Cost
Now let’s check what this synthesis cost in tokens, dollars, and time.
usage = history.usage
print(f"Total tokens: {usage.total_tokens:,}")
print(f"Total cost: ${usage.total_cost:.4f}")
print(f"Steps: {len(history.history)}")
print(f"Duration: {history.total_duration_seconds():.1f}s")
Total tokens: 42,339
Total cost: $0.1157
Steps: 3
Duration: 53.2s
12 cents and 53 seconds is what this run cost. Here is what the same money buys you elsewhere:
- GPT-4o. A single long-prompt API call. Same dollar cost, but no browser, no scraping, no real page involved.
- Your own time. Opening Hacker News, skimming 30 stories, copying the AI ones into a doc, and writing a summary by hand. Probably 10 to 20 minutes of focused work.
- Playwright + LLM. A scraper to grab the page, one LLM call to classify, another to synthesize, and the code to glue them together. More code, more failure points, and likely more total tokens.
If 12 cents per run is too much for your use case, there are three ways to bring it down:
- Use a cheaper model. Swap
gpt-4oforgpt-4o-miniorclaude-haiku. The same task often runs for under 2 cents, with some loss of reasoning quality. - Run a local model. browser-use works with Ollama and LM Studio, so the dollar cost drops to zero. The trade-off is that local models need to be 30B+ to handle structured output reliably, and each step is slower.
- Tighten the prompt. Shorter tasks mean fewer steps, and each step carries the full conversation history forward, so cutting one step can save thousands of tokens.
For a full walkthrough on setting up local LLMs for workflows like this, see our LangChain and Ollama guide.
A Second Experiment: Scraping Newegg
browser-use doesn’t always work this well. To show where it breaks, I ran it on a harder task: scraping Newegg for gaming laptops. Here is the prompt:
task=(
"Go to https://www.newegg.com/Gaming-Laptops"
"/SubCategory/ID-3365 and find gaming laptops "
"matching these criteria:\n"
"- Price: $0-$1500\n"
"- GPU: NVIDIA GeForce RTX 50 Series\n"
"- RAM: 32GB\n"
"For each laptop, extract the name, "
"price, GPU, CPU, RAM, and storage. "
"Then pick the best value and explain why."
)
This run didn’t go well. Here are the issues:
- Price filter: The agent typed the values but skipped APPLY, so the listings never refreshed.
- 32GB RAM constraint: Silently dropped. The final results all had 16GB.
- Pagination: Stopped at the first page instead of collecting results from all pages.
Trade-offs
browser-use is powerful, but it comes with real trade-offs:
- Speed: The agent took 30–60 seconds for the Hacker News task. Each step requires LLM reasoning, while a Playwright script would finish in ~5 seconds.
- Cost: A single run is cheap (~$0.12), but costs grow quickly. Since each step carries full context, doubling steps can cost more than 2x.
- Non-determinism: Results vary between runs. The agent may take different actions, and the judge may reach different conclusions.
- Task fit: Results depend heavily on the page and what you ask for. The same tool can work well on one site and fail on another.
When to Use Each Tool
Knowing the trade-offs, here is how to decide which tool to reach for:
| Metric | Playwright | browser-use |
|---|---|---|
| Speed | Fast | Slower |
| Cost per run | Free | Paid per LLM call |
| Deterministic | Yes | No |
| Works on a new site | Needs new selectors | Change the URL |
| Handles reasoning tasks | No (hardcoded rules) | Yes (LLM reasons) |
| Exact constraint satisfaction | Yes | No (silently relaxes hard constraints) |
Choose Playwright when:
- You need identical results every run
- Speed matters (high-volume or frequent runs)
- You need exact constraint satisfaction (e.g., “every result must have 32GB RAM”)
Choose browser-use when:
- The task requires judgment or classification (e.g., “which of these stories are about AI”) rather than pattern matching
- The task requires synthesis across multiple items (e.g., “what themes connect these”)
- The page or task may change over time and you do not want to maintain selectors
- You want scraping, classification, and reasoning in a single prompt instead of three separate pipelines
Conclusion
The best way to understand browser-use is to try it on a small project you actually care about. Here are a few ideas:
- Summarize a Reddit thread. Provide a URL and ask for the top 3 arguments. Then schedule it to run daily across selected subreddits and post summaries to Slack.
- Pull today’s top stories from a news site. Extract titles, sources, and short summaries, then schedule a daily digest sent to your inbox.
- Watch the price of a single product. Monitor a product page and get notified when the price drops below a set threshold.
Each of these starts with a single prompt, costs just a few cents to test, and can quickly become something you use every day.
Related Tutorials
- From CSS Selectors to Natural Language: Web Scraping with ScrapeGraphAI: Another natural-language scraping tool with a different architectural approach, built around graph-based extraction pipelines.
Stay Current with CodeCut
Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.




