If you spend your day in a terminal, you have probably noticed how many browser automation tools now expect you to live somewhere else — a SaaS dashboard, a low-code recorder, a hosted IDE. The best CLI browser automation tools in 2026 push back against that. They run from the command line, take a plain-English objective or a script, drive a real browser, and hand you back structured output you can pipe into CI or feed to an AI coding agent. No tab-switching, no clicking through a wizard.
This roundup compares four terminal-native options that keep coming up: Kane CLI from TestMu AI, Vercel Labs' agent-browser, Stagehand from Browserbase (technically an SDK, but it belongs in this conversation), and BrowserBash. I have used these the way an SDET actually uses them — wiring them into pipelines, watching them fail on flaky flows, checking what artifacts they leave behind. Where a competitor's pricing, model list, or architecture isn't publicly documented, I say so rather than guess. The goal is a decision you can defend, not a leaderboard.
What "CLI browser automation" means in 2026
A few years ago, "browser automation from the CLI" meant a Selenium or Playwright script you wrote in code and ran with node or pytest. That still works and is still the right answer for a lot of teams. What changed is the arrival of a new layer: tools where the instruction is natural language, and an LLM decides which buttons to click and which fields to fill, reading the page the way a person would instead of matching a brittle CSS selector.
That shift matters for two audiences. The first is AI coding agents — Claude Code, Codex CLI, Cursor, Gemini CLI — that generate web code and then need to verify it works in a browser. They can't read a screenshot reliably enough to trust it, so they shell out to a CLI that returns machine-readable results. The second is engineers who want to write a smoke test in one sentence and commit it, without maintaining a page object model.
The comparison below is organized around the four dimensions that actually differentiate these tools in practice: openness, the model story, CI output, and recordings.
The contenders at a glance
Here is the high-level shape of each tool before we go deep. Read the caveats in the cells — several of these facts are deliberately hedged because the vendors haven't published specifics as of 2026.
| Tool | Maker | License | Primary audience | Where the browser runs |
|---|---|---|---|---|
| Kane CLI | TestMu AI (formerly LambdaTest) | Apache-2.0 (CLI) | AI coding agents + developers | Local Chrome over CDP; cloud grid available |
| agent-browser | Vercel Labs | Open source (see repo) | AI coding agents | Local Chrome for Testing; CDP; Safari via WebDriver |
| Stagehand | Browserbase | MIT | Developers building agents (SDK) | Local Playwright or Browserbase cloud |
| BrowserBash | The Testing Academy | Apache-2.0 | Developers, SDETs, AI agents | Local Chrome (default); CDP; cloud providers |
A clarification up front: Stagehand is an SDK, not a CLI. You import it into TypeScript or Python and call its primitives in code. I'm including it because it's the engine inside several CLIs (BrowserBash's default engine is Stagehand) and because anyone shopping for "natural-language browser automation" will run into it. If you want a literal command you type and run, Stagehand alone is not it — but it's foundational to the category.
Openness: license, account, and where your data goes
Openness is more than a license badge. The questions that actually bite you later are: can I run this with no account, can I self-host it, and does my page data leave my machine?
Kane CLI publishes its CLI under Apache-2.0, which is genuinely permissive. The practical catch is that you authenticate with kane-cli login using credentials from a TestMu AI account, so while the code is open, the default workflow assumes a sign-in. That's a reasonable trade for the team that already lives in the LambdaTest/TestMu ecosystem and wants the cloud grid, but it's friction if you wanted a zero-account local tool.
agent-browser is open source from Vercel Labs and is unusual here for being built primarily in Rust (a Rust CLI talking to a Node.js daemon that manages Playwright). Its whole design goal is token efficiency for agents — compact, ref-based accessibility-tree snapshots instead of verbose DOM dumps, a clever answer to the "agents burn context reading pages" problem. Check the repository directly for the current license terms, since I won't assert one that isn't confirmed here.
Stagehand is MIT licensed, which is about as open as it gets. You can drop it into any project. The nuance is operational: Stagehand the SDK is free, but the natural complement is Browserbase's hosted browser infrastructure, which is a paid product. You can absolutely run Stagehand against local Playwright with no cloud, so the openness is real — just know that the "happy path" Browserbase markets is the cloud one.
BrowserBash is Apache-2.0, free, and — the part that distinguishes it — needs no account to run at all. You npm install -g browserbash-cli and immediately browserbash run "..." against your own Chrome. There's no login, no key required for the default path. On local models nothing leaves your machine, which makes it the easy pick when the page you're automating is behind a corporate firewall or contains data you can't ship to a third party. There's an optional local dashboard at localhost:4477 that's also fully local, and an opt-in cloud dashboard you only touch if you explicitly pass --upload.
If your hard requirement is "must run with no account and no data egress," BrowserBash and a self-hosted Stagehand are the two that clear the bar cleanly. Kane is open but account-oriented by default. agent-browser is local-first by design but check the repo for current terms.
The model story: who pays for inference, and can it run locally
This is where the tools diverge most, and it's the dimension teams underestimate. Every natural-language browser tool needs an LLM to interpret the page and decide actions. The question is which model, where it runs, and who pays.
Most tools in this space assume a hosted frontier model — Claude, GPT, or Gemini — which means an API key and a per-token bill that scales with how many steps your flows take. Multi-step flows are token-hungry, so a large suite can get expensive. None of the competitor pricing here is something I'll invent; if you need exact numbers, get them from the vendor, because they aren't uniformly published as of 2026.
Kane CLI is built to slot into agent workflows and works with hosted models; its tie to the TestMu cloud suggests a managed-inference angle, but I won't state a pricing model that isn't documented here. agent-browser is model-agnostic from the agent's side — it's the execution layer, and the agent driving it (Claude Code, Cursor, etc.) supplies the intelligence, so your model cost is wherever your coding agent already bills. Stagehand lets you bring your own model and supports several providers; you pay your chosen provider directly.
BrowserBash takes a deliberately different stance: it's Ollama-first. The default model setting is auto, which resolves in a specific order — first a local Ollama install (ollama/<model>, free, no keys, nothing leaves your machine), then ANTHROPIC_API_KEY if present (claude-opus-4-8), then OPENAI_API_KEY (openai/gpt-4.1), otherwise it errors with guidance. The headline benefit is a guaranteed $0 model bill when you run local. You can pin a backend explicitly:
# Free, fully local — needs Ollama running, no API keys
browserbash run "Log in as test@example.com / hunter2 and confirm the dashboard greeting shows my name" --model ollama/qwen3
# Hosted Claude for a hard multi-step flow
export ANTHROPIC_API_KEY=sk-ant-...
browserbash run "Add two items to the cart, apply code SAVE10, and verify the discount line" --model claude-opus-4-8
Here's the honest caveat, because it's the thing nobody tells you: very small local models (roughly 8B parameters and under) are flaky on long, multi-step objectives. They lose the plot halfway through a checkout. The sweet spot for reliable local runs is a mid-size model — Qwen3 or a Llama 3.3 70B-class model — or a capable hosted model when the flow is genuinely hard. BrowserBash lets you mix: cheap local models for simple smoke checks, a hosted model for the gnarly regression flow. You can also route through OpenRouter (openrouter/<vendor>/<model>) or an Anthropic-compatible gateway. The model selection guide walks through picking one.
If avoiding a per-run model bill is your priority, BrowserBash's local-first default is the standout. If you're already paying for a coding agent's tokens, agent-browser's "the agent supplies the brain" model means no separate inference line item. Stagehand's bring-your-own-model is the most flexible if you have provider preferences.
CI output: structured results an agent or pipeline can trust
A browser tool is only useful in CI if it returns something a machine can branch on without parsing English. This is the dimension where the category has genuinely converged, and it's good news.
Kane CLI ships an agent mode (--agent --headless) that emits structured NDJSON — one JSON object per line — explicitly so Claude Code, Codex CLI, and Gemini CLI can read progress and decide what to do next. agent-browser is built around compact, structured output as its core value proposition; its 50-plus commands return machine-parseable results designed to minimize the context an agent spends reading them. Stagehand, being an SDK, returns typed objects directly in your code — extract() gives you schema-validated data, observe() returns candidate actions — which is arguably the cleanest contract of all because it's just function return values.
BrowserBash uses --agent to emit NDJSON, one JSON object per line. Progress events look like {"type":"step","step":1,"status":"passed","action":"navigate","remark":"..."} and the run ends with a terminal event: {"type":"run_end","status":"passed|failed|error|timeout","summary":"...","final_state":{...},"duration_ms":...}. Critically, the exit codes are stable and conventional: 0 passed, 1 failed, 2 error, 3 timeout. That means a CI gate is a one-liner — no prose parsing, no regex over a summary string.
# Headless CI run; branch on the exit code, capture the NDJSON stream
browserbash run "Open the pricing page and verify the Pro plan lists 'Unlimited seats'" \
--headless --agent > result.ndjson
echo "exit code: $?" # 0 pass, 1 fail, 2 error, 3 timeout
If you're feeding an AI coding agent, all four are credible — this is now table stakes. The tiebreaker is the rest of the surface. BrowserBash's committable Markdown tests (*_test.md, one step per list item, {{variables}} templating, @import composition, and secret-marked values masked as ***** in every log line) make it pleasant for a human-and-agent shared workflow. There's a deeper walkthrough in the agent-mode tutorials.
Recordings and artifacts: what you get to look at after a run
When a flow fails at 2am, the difference between a five-minute fix and an hour of guessing is what the tool captured. Screenshots, video, and traces are the artifacts that matter.
Kane CLI emphasizes autoheal and smart waiting (adapting when a button moves or a class changes) and, like other tools in the LambdaTest lineage, is oriented toward producing reviewable test evidence — though I'll point you to their docs for the exact artifact list rather than overstate it. agent-browser can take screenshots as one of its commands and supports video recording per its references; it's an execution layer, so the artifacts are the building blocks rather than a packaged report. Stagehand, as an SDK, leaves artifact capture largely to you and to Playwright underneath — you get Playwright's full tracing if you wire it up, which is powerful but is your code to write.
BrowserBash bundles this with a single flag. Pass --record and you get a screenshot plus a .webm session video via bundled ffmpeg; on the builtin engine it also writes a full Playwright trace you can open in the trace viewer. Every run is also kept on disk at ~/.browserbash/runs (secrets masked, capped at 200 runs), so you have a local history without any cloud. That on-disk store plus the local dashboard means you can review a failed run entirely offline.
# Capture a screenshot + .webm video; builtin engine also writes a Playwright trace
browserbash run "Complete checkout with the saved test card and confirm the order number appears" \
--engine builtin --record
The honest framing: if you want recordings as a one-flag default with zero setup, BrowserBash is the most batteries-included here. If you're comfortable wiring Playwright tracing yourself, Stagehand gives you the same underlying power with more control. For agent-browser and Kane, treat the artifact story as "supported, check the current docs for the exact format."
Engines and providers: the flexibility layer
One thing that separates BrowserBash architecturally is the split between engine (who interprets the English) and provider (where the browser runs). The default engine is stagehand — yes, the same Stagehand from Browserbase, MIT-licensed, with its act/extract/observe/agent primitives and self-healing. The alternative is builtin, an in-repo Anthropic tool-use loop driving Playwright, which is auto-selected for LambdaTest and BrowserStack runs. You switch with --engine stagehand|builtin.
On providers, --provider local (your Chrome) is the default, but you can point at any DevTools endpoint with --provider cdp --cdp-endpoint ws://..., or run on Browserbase, LambdaTest, or BrowserStack with the relevant credentials. That CDP escape hatch is genuinely useful — it means BrowserBash can drive a browser that something else launched, including containers and remote grids.
This is where the tools quietly overlap. agent-browser also supports CDP and even Safari via WebDriver; Kane drives local Chrome over CDP and offers a cloud grid. The "drive a real browser, locally or remote, your choice" pattern is becoming standard — good for you, because switching costs between these tools are lower than the marketing implies.
When to choose which tool
No tool wins on every axis. Here's the balanced read, including where each competitor is genuinely the better pick.
Choose Kane CLI if your team is already in the LambdaTest/TestMu ecosystem, you want native adapters for Claude Code, Cursor, Codex CLI, and Gemini CLI out of the box, and the account-based login plus cloud grid is a feature rather than friction for you. Its plain-English-to-Playwright code generation and human-pause-on-OTP/CAPTCHA behavior are real conveniences for verification-heavy agent workflows.
Choose agent-browser if your primary use is feeding an AI coding agent and you care most about context efficiency. The Rust core and compact, ref-based snapshots are purpose-built to keep an agent's token budget low, which matters a lot when the agent runs dozens of browser steps per task. If you're optimizing the cost and reliability of an agent loop specifically, this is the most focused tool here.
Choose Stagehand if you're building automation in code and want maximum control. As an MIT-licensed SDK with typed extraction and clean primitives, it's the right foundation when you're writing a TypeScript or Python application rather than typing commands. Pair it with Browserbase if you want managed cloud browsers. It is the better fit than any CLI when "this is part of my app, not a one-off command" is true.
Choose BrowserBash if you want a free, account-free, local-first CLI where the default is $0 model inference. It's the strongest pick when data can't leave your machine, when you want recordings and a run history with zero setup, and when you want one tool that works both as a human-typed command and as an agent-driven NDJSON producer. The honest limitation is the model one: lean on a mid-size local model or a hosted model for hard flows, not a tiny 8B model. See the feature overview and real case studies for how teams use it.
A practical hybrid several teams land on: use BrowserBash locally for fast, free iteration and committed Markdown smoke tests, and reach for a hosted model (or a cloud provider) only on the handful of flows that are genuinely hard. The pricing page confirms the CLI itself stays free regardless.
Side-by-side on the four dimensions
To pull it together, here's the same four tools scored on the dimensions this roundup is built around. "Not publicly specified" means exactly that — I'm not inventing a value.
| Dimension | Kane CLI | agent-browser | Stagehand | BrowserBash |
|---|---|---|---|---|
| Run with no account | No (login required) | Yes | Yes (SDK) | Yes |
| Free local model path | Not publicly specified | Via your agent | Bring your own | Yes (Ollama-first, $0) |
| NDJSON / structured CI output | Yes (--agent) |
Yes (compact) | Typed return values | Yes (--agent) |
| Stable CI exit codes | Yes | Yes | N/A (SDK) | Yes (0/1/2/3) |
| Recordings as one flag | Check docs | Supported | Your code (Playwright) | Yes (--record) |
| Committable Markdown tests | Yes | Not publicly specified | N/A (SDK) | Yes (*_test.md) |
| License (CLI/SDK) | Apache-2.0 | See repo | MIT | Apache-2.0 |
The pattern that emerges: the contract (NDJSON, exit codes, real browsers, plain English) has standardized across the category, which is great. The defaults are what differ — and the biggest default split is cost and data locality, where BrowserBash's local-first, account-free, Ollama-first stance is the clearest differentiator, while Kane's ecosystem integration and agent-browser's context efficiency are theirs.
How to actually evaluate them this week
Don't pick from a table. Pick from a spike. Take one real flow from your app — ideally a login plus one meaningful action, since that exercises the parts that break — and run it through two of these tools end to end. Watch three things: does it complete the flow reliably across five runs, what does it leave you when it fails, and how much did the run cost in tokens or time.
For BrowserBash that spike is two commands — install, then run against your own Chrome with a local model — and you'll know within ten minutes whether the model you picked is strong enough. If a small local model stumbles, bump to a 70B-class or hosted model and run it again before judging the tool. The getting-started tutorials have copy-paste flows for login, search, and checkout. The same disciplined spike works for the others — just budget for the account setup on Kane and the code-writing on Stagehand.
FAQ
What is the best CLI browser automation tool in 2026?
There's no single winner — it depends on your constraint. agent-browser is excellent for feeding AI coding agents efficiently, Kane CLI suits teams in the TestMu/LambdaTest ecosystem, and Stagehand is the best SDK if you're automating in code. BrowserBash is the strongest pick if you want a free, account-free, local-first CLI with a $0 model path. Match the tool to whether you care most about cost, ecosystem, or control.
Can I run browser automation from the CLI without API keys or an account?
Yes. BrowserBash runs with no account and no API key on its default local path — it uses a local Ollama model so nothing leaves your machine and there's no per-token bill. You install it with one npm command and run immediately against your own Chrome. Other tools in this space typically assume either a hosted model key or a vendor account, so check each tool's default before assuming it's free.
Do these CLI tools work with AI coding agents like Claude Code and Cursor?
Yes, all four are built with agents in mind. Kane CLI ships native adapters for Claude Code, Cursor, Codex CLI, and Gemini CLI; agent-browser is designed specifically as an agent execution layer; and BrowserBash emits NDJSON with stable exit codes that any agent or CI pipeline can branch on without parsing prose. Stagehand integrates at the code level since it's an SDK rather than a command.
Why do small local models struggle with browser automation?
Very small local models — roughly 8B parameters and under — tend to lose track of long, multi-step objectives, drifting or repeating actions partway through a flow like a checkout. They're fine for short, single-action checks but unreliable for complex sequences. The practical fix is to use a mid-size local model such as Qwen3 or a Llama 3.3 70B-class model, or a capable hosted model for genuinely hard flows, while keeping cheap local models for simple smoke tests.
Picking a terminal-native tool comes down to cost, data locality, and the artifacts you need when something breaks. If a free, account-free, local-first CLI fits your constraints, BrowserBash is one command away:
npm install -g browserbash-cli
No account is required to run it — but if you want the optional cloud dashboard later, you can sign up here.