A natural language browser automation CLI flips the usual workflow on its head. Instead of opening an editor, importing a framework, hunting for a stable selector, and wiring up waits, you type one plain-English sentence into your terminal and watch a real Chrome window do the work. "Go to the staging site, log in as the demo user, add the blue backpack to the cart, and confirm the cart shows one item." That is the whole instruction. No data-testid, no page object, no await page.waitForSelector. An AI agent reads the live page, decides what to click, performs the steps, and hands back a pass/fail verdict plus any values it extracted.
This guide walks through how that actually works, where it shines, where traditional code-first automation is still the better call, and how to run it yourself with BrowserBash — a free, open-source CLI built around exactly this idea. I will be honest throughout about the rough edges, because a tool that drives a browser from an English sentence has real limits as well as real wins.
What a natural language browser automation CLI is
Strip away the marketing and the concept is simple. You describe an objective in ordinary language. A large language model interprets that objective against the actual DOM of a live page. It chooses an action — navigate, click, type, scroll, extract — executes it through a real browser, observes the result, and loops until the objective is met or it gives up. At the end you get a verdict and, when you ask for them, structured extracted values.
Compare that to the way most teams automate today. A Playwright or Selenium script is a frozen translation of intent into selectors. A human reads the goal once, translates it into getByRole('button', { name: 'Submit' }).click(), tunes the waits, and then maintains that translation forever. It is fast and deterministic. It is also brittle: rename a class, restructure a div, ship a redesign, and the script snaps. Someone has to go fix the mapping by hand.
A natural language browser automation CLI moves that translation work to runtime. The English-to-action mapping is re-derived on every run by the model, against the page as it exists right now. That is the core trade. You stop maintaining selectors and start trusting a model to find the right element each time. The benefit is dramatically less maintenance. The cost is that a model is non-deterministic and occasionally wrong, which is why how you write your instruction — and which model you point at it — matters a lot.
The terminal is the right home for this
You could do natural-language browser control inside a chat app or a hosted dashboard. Plenty of products do. But the terminal has properties those interfaces lack. It is scriptable. It composes with everything else on your machine through pipes, exit codes, and shell loops. It drops cleanly into CI without a browser tab or a login. And it fits the way developers and SDETs already work: a command, an output, a non-zero exit code when something breaks. A CLI keeps natural-language automation in the same place as your git commits, your test runner, and your deploy scripts.
How BrowserBash drives a real browser from one sentence
BrowserBash is a CLI from The Testing Academy (founder Pramod Dutta), distributed under Apache-2.0. You install it once and run it from anywhere:
npm install -g browserbash-cli
browserbash run "go to https://www.saucedemo.com, log in as standard_user, add the first product to the cart, and confirm the cart badge shows 1"
That single run command does the whole loop. BrowserBash launches your local Chrome, an agent reads the page, and it works through the objective step by step. There is no script file, no selector, no setup. When it finishes it prints whether the objective passed or failed and any data it pulled out along the way. You need Node 18 or newer and Chrome installed for the default local provider. No account, no signup, nothing to configure before your first run.
Under the hood, two pieces matter: the engine that interprets your English, and the model that powers the reasoning.
Engines: who reads the English
BrowserBash ships with two engines, switchable per run with --engine.
The default is Stagehand, the open-source (MIT) framework from Browserbase. It exposes act, extract, observe, and agent primitives on top of Playwright, and it self-heals — instructions like "click the checkout button" get resolved at runtime, so a redesign does not automatically break your run. Stagehand is a real, widely-used project in its own right, and BrowserBash uses it as the default brain for interpreting objectives.
The second is builtin, an in-repo Anthropic tool-use loop that drives Playwright directly. BrowserBash switches to it automatically when you target LambdaTest or BrowserStack, and you can pin it yourself. It is the engine that also writes a Playwright trace when you record a run.
browserbash run "search for a wireless keyboard under $50 and list the top 3 results with prices" --engine stagehand
You rarely need to think about engines for everyday use. The default works for most local automation. The split exists so that grid providers and trace-heavy debugging get an engine tuned for them.
Models: Ollama-first, and what that means for your bill
This is where BrowserBash makes an opinionated choice that sets it apart from many hosted alternatives. The default model is auto, and it resolves in a specific order:
- A local Ollama install, used as
ollama/<model>— free, no API keys, and nothing leaves your machine. - An
ANTHROPIC_API_KEYin your environment, resolving toclaude-opus-4-8. - An
OPENAI_API_KEY, resolving toopenai/gpt-4.1. - If none of those are present, a clear error telling you how to fix it.
The headline here is the local path. If you run a model through Ollama, every step of the browser automation happens on your hardware. No prompts, no page content, no extracted data goes to a third party, and your model bill is a guaranteed zero dollars. For privacy-sensitive flows — internal admin tools, anything touching customer data, regulated environments — that is a genuinely different posture than sending every page to a hosted API.
Here is the honest caveat, and it is an important one. Very small local models, roughly 8 billion parameters or under, are flaky on long multi-step objectives. They will handle "search for X and tell me the first result" but stumble on "log in, navigate three pages deep, fill a multi-section form, and verify a confirmation." The sweet spot for local is a mid-size model — Qwen3 or a Llama 3.3 70B-class model — which has enough reasoning headroom for real flows. If you need maximum reliability on a hard objective, a capable hosted model (Claude, GPT-4.1, Gemini) is still the safer bet. Pick the model to match the difficulty of the job.
You can pin any backend explicitly:
browserbash run "extract the pricing tiers and their monthly prices from this page" --model ollama/qwen3
browserbash run "complete the multi-step signup wizard with a throwaway email" --model claude-opus-4-8
Beyond Ollama and Claude, you can point --model at openai/gpt-4.1, google/gemini-2.5-flash, an openrouter/<vendor>/<model> route like openrouter/meta-llama/llama-3.3-70b-instruct, or an Anthropic-compatible gateway via ANTHROPIC_BASE_URL. That range means you can start free on local, then graduate one hard flow to a hosted model without changing anything else about how you run.
Where the browser actually runs
Interpreting English is one axis; where the Chrome window lives is another. BrowserBash controls that with --provider.
The default is local — your own Chrome on your own machine. For most development and debugging, that is what you want, because you can watch the run happen in a visible window. Add --headless when you do not need to see it.
The other providers let you push the same English objective to remote infrastructure without rewriting it:
cdpconnects to any Chrome DevTools Protocol endpoint via--cdp-endpoint ws://..., so you can attach to a browser you are already running or one in a container.browserbaseruns on Browserbase's cloud (needsBROWSERBASE_API_KEYandBROWSERBASE_PROJECT_ID).lambdatestandbrowserstackrun on those grids for cross-browser coverage; both need their respective credentials and both automatically use the builtin engine.
browserbash run "load the marketing homepage and confirm the hero headline and primary CTA button are visible" --provider lambdatest --headless
The point of separating provider from engine from model is that your instruction is portable. The same sentence that ran against your local Chrome can run against a Safari instance on a cloud grid by changing one flag. You are not rewriting selectors for each environment, because there are no selectors.
Writing objectives that actually work
The biggest skill in a natural language browser automation CLI is writing a good objective. A vague instruction gives the model too much room to wander; an over-specified one fights the model's strengths. After enough runs you converge on a style that reads like instructions you would give a careful new hire.
Be concrete about the goal and the success condition. "Test the login" is weak. "Log in as standard_user with password secret_sauce and confirm the inventory page loads with at least one product card" tells the agent exactly when it has succeeded. Name the values you want extracted. Anchor navigation with a URL when you can. Keep one objective to one coherent task rather than cramming five unrelated checks into a run.
Two things make a measurable difference to reliability:
- Give an explicit verification step. End the objective with what "done" looks like. The agent uses that to decide pass versus fail, and a sharp success condition cuts false passes.
- Match model size to objective length. A three-step objective is fine on a small local model. A twelve-step flow wants a mid-size local model or a hosted one. Trying to push a long flow through an 8B model is the single most common cause of a flaky run.
The tutorials and the learn section go deeper on phrasing patterns, and the guidance on writing reliable objectives is worth reading before you automate anything important.
Markdown tests: committable plain-English flows
A one-shot run is great for ad-hoc work, but real teams need automation they can commit, review in a pull request, and run repeatedly. BrowserBash handles that with markdown tests — *_test.md files where each list item is a step in plain English.
browserbash testmd run ./checkout_test.md
A markdown test reads like documentation because it is documentation. Each bullet is a step. You can template values with {{variables}}, compose files with @import so shared setup lives in one place, and mark variables as secret so they show up as ***** in every log line instead of leaking into your terminal or CI output. After each run, BrowserBash writes a human-readable Result.md so anyone — including non-engineers — can see what happened without reading a stack trace.
This is the format that makes natural-language automation maintainable at a team level. A product manager can read the test. A reviewer can diff a one-line English change instead of a forty-line selector refactor. And because the steps are intent rather than implementation, a frontend redesign does not require rewriting the file. The markdown tests tutorial covers variables, imports, and secret masking in detail.
Built for CI and AI coding agents: agent mode
Plain prose output is fine for a human watching a terminal. It is useless for a pipeline or another program. That is what --agent mode is for: it emits NDJSON, one JSON object per line, with no prose to parse.
You get progress events as the run proceeds — objects like {"type":"step","step":1,"status":"passed","action":"navigate","remark":"..."} — and a single terminal event when it ends: {"type":"run_end","status":"passed","summary":"...","final_state":{...},"duration_ms":...}. The exit codes are stable and scriptable: 0 passed, 1 failed, 2 error, 3 timeout.
browserbash run "verify the contact form submits and shows a thank-you message" --agent
Those exit codes are the whole point in CI. A pipeline step that exits non-zero fails the build, no log scraping required. And NDJSON is exactly what an AI coding agent — Claude Code, a CI bot, your own orchestration script — wants to consume: structured events it can react to step by step. If you are wiring BrowserBash into a GitHub Actions workflow or a Jenkins pipeline, agent mode plus exit codes is the integration surface you build against.
Recording, dashboards, and where your data goes
When a run fails and you need to see why, add --record. BrowserBash captures a screenshot and a .webm session video using a bundled ffmpeg, and the builtin engine also writes a Playwright trace you can open in the Playwright trace viewer. That gives you a frame-by-frame replay of what the agent saw and did.
For a broader view, browserbash dashboard starts a local dashboard at localhost:4477. It is fully local — your runs, your machine, no network. Pass --clear to wipe the store. You can also open the dashboard for a specific run with --dashboard.
There is an optional cloud dashboard too, and the privacy model here is worth stating plainly. By default, nothing leaves your machine. To use the cloud, you first link it with browserbash connect --key bb_..., and then you must add --upload to each individual run you want pushed. Without --upload, a run stays entirely local. Free cloud runs are kept for 15 days. The opt-in is per-run and explicit, which matters if some of your flows touch data you would never want off-box. Every run is also kept on disk at ~/.browserbash/runs with secrets masked, capped at the most recent 200.
How it compares to other natural-language tools
BrowserBash is not the only way to drive a browser from plain English, and it is worth being clear about where it sits. Several strong projects overlap with parts of what it does.
| Tool | Form factor | Natural language | Local/free model path | Best fit |
|---|---|---|---|---|
| BrowserBash | CLI (browserbash) |
Objective per run or markdown tests | Yes — Ollama-first, $0 on local | Terminal/CI-driven automation and testing with selector-free flows |
| Stagehand | TypeScript/Python SDK | act/extract/observe/agent primitives | Bring your own model keys | Developers embedding AI browser control inside their own code |
| Browser Use | Python library | Goal-driven agent loop | Bring your own model keys | Python developers building autonomous web agents in code |
| Hosted "operator"-style agents | Web app / chat | Conversational tasks | Hosted models only | Non-developers running one-off web tasks in a browser tab |
A few honest notes on that table. Stagehand is excellent, and BrowserBash actually uses it as its default engine — so this is less "versus" and more "BrowserBash packages Stagehand into a CLI with model resolution, providers, markdown tests, and CI output around it." If you are writing TypeScript and want AI primitives inside your own codebase rather than a command-line tool, reach for Stagehand directly; that is what it is built for. Browser Use is the better fit if you live in Python and want to compose an agent loop programmatically. Hosted operator-style products win when the user is not a developer and just wants to type a task into a browser without touching a terminal — exact model and pricing details for those vary by vendor and are not always publicly specified, so check current docs before assuming.
Where BrowserBash earns its place is the combination: a real CLI, an Ollama-first model story that can run genuinely free and fully local, committable markdown tests, NDJSON for agents and CI, and one-flag portability across local Chrome, CDP, Browserbase, LambdaTest, and BrowserStack. If your work happens in a terminal and a pipeline, that bundle is hard to assemble from parts. You can see more on the features page and read through real walkthroughs on the blog.
When to choose a natural-language CLI — and when not to
Honesty matters more than enthusiasm here, so here is the balanced view.
Reach for a natural language browser automation CLI when:
- Your selectors break constantly and maintenance is eating your week. Moving the English-to-element mapping to runtime is the whole win.
- You are writing smoke tests, synthetic monitoring, or onboarding/checkout flow checks that change shape often.
- You want non-engineers to read and review tests. Markdown tests are diff-friendly English.
- Privacy or budget rules out sending pages to a hosted API. The Ollama-first local path keeps everything on-box at zero model cost.
- You are giving an AI coding agent a browser. NDJSON plus stable exit codes is purpose-built for that.
Stick with code-first Playwright or Selenium when:
- You need millisecond-precise, fully deterministic behavior on a flow that almost never changes. A hand-tuned selector is faster and never flickers.
- Your objective is enormous and unbroken, and you only have a tiny local model. The small-model caveat is real; either break the flow up, use a mid-size local model, or use a hosted one.
- You require exact byte-level assertions or low-level network/timing control that lives below the level of "click the thing that says X."
The good news is that this is not all-or-nothing. Many teams run natural-language checks for the high-churn, high-maintenance surface area and keep a thin layer of deterministic code-first tests for the few flows that demand it. BrowserBash being free and local means trying it costs you nothing but the time of one run command. For a sense of how teams are actually mixing these, the case study and pricing pages are a useful next stop.
A realistic first hour
If you want to feel the difference rather than read about it, here is a path that takes under an hour. Install the CLI and run one objective against a site you know, watching the visible Chrome window so you can see the agent reason. Then run it again with --record and open the video to see exactly what it saw. Move one real flow into a *_test.md file, parameterize the environment URL with a {{variable}}, and commit it. Wire that markdown test into CI with --agent and let the exit code gate your build. Finally, if any flow needs more reliability than your local model gives, pin a hosted model for that one run and leave the rest free and local.
By the end you will have a feel for the two things that decide whether this works for you: how you phrase objectives, and which model you point at the hard ones. Both are learnable in an afternoon, and the learn section is built to shorten that curve. The full source lives on GitHub if you want to read exactly how the loop works.
FAQ
What is a natural language browser automation CLI?
It is a command-line tool where you type a plain-English objective and an AI agent drives a real browser to complete it — no selectors, no scripts, no page objects. The agent reads the live page, decides what to click and type, performs each step, and returns a pass or fail verdict plus any data it extracted. BrowserBash is one such CLI: you install it with npm and run objectives directly from your terminal.
Is BrowserBash free, and does my data leave my machine?
BrowserBash is free and open-source under Apache-2.0, with no account required to run it. When you use a local model through Ollama, nothing leaves your machine and your model bill is zero, because every step runs on your own hardware. Cloud upload is strictly opt-in: you must link your account and add the --upload flag per run, and without it everything stays local.
Do I need an API key or a paid model to use it?
No. The default auto model setting prefers a local Ollama install first, which needs no keys and costs nothing. If you would rather use a hosted model, BrowserBash will pick up an ANTHROPIC_API_KEY or OPENAI_API_KEY from your environment, and you can also point --model at OpenRouter, Gemini, or an Anthropic-compatible gateway. Very small local models can be unreliable on long flows, so a mid-size local model or a hosted one is recommended for hard, multi-step objectives.
Can it run in CI and cross-browser environments?
Yes. Agent mode (--agent) emits NDJSON with stable exit codes — 0 passed, 1 failed, 2 error, 3 timeout — so a pipeline step fails the build without any log parsing. The same English objective runs on your local Chrome by default or on remote infrastructure by changing one flag, including CDP endpoints, Browserbase, LambdaTest, and BrowserStack for cross-browser coverage. Markdown tests give you committable, reviewable flows that fit naturally into a CI workflow.
Ready to try it? Install the CLI and run your first objective in under a minute:
npm install -g browserbash-cli
No account is needed to run anything, but if you want the optional cloud dashboard later, you can sign up here.