tutorial· 12 min read· by Pramod Dutta

Stagehand vs builtin engine in BrowserBash: when to use each

Hands-on guide to the BrowserBash engine choice: Stagehand vs builtin, the --engine flag, why grids auto-use builtin, and traces on builtin runs.

By the end of this tutorial you'll know exactly which BrowserBash engine to reach for on any given run, how to switch between them with the --engine flag, and why a grid provider like LambdaTest or BrowserStack quietly forces the builtin engine no matter what you ask for. The word "engine" gets thrown around a lot in browser automation, so let's pin it down: in BrowserBash, the engine is the component that reads your plain-English objective and decides what to click, type, and verify in a real Chrome browser. There are two of them — stagehand (the default) and builtin — and they behave differently enough that picking the right one can be the difference between a clean pass and a flaky, half-finished run. We'll run both side by side against the same objective, read the verdicts, and pull a Playwright trace out of a builtin run so you can see exactly what the agent did.

BrowserBash is a free, open-source CLI from The Testing Academy. You write an objective in English, an AI agent drives a real browser step by step (no selectors, no page objects), and you get back a pass/fail verdict plus any structured values it extracted. Everything below is real and runnable.

What you'll need

This is a hands-on lesson, so set yourself up before we start. The free local path needs almost nothing.

npm install -g browserbash-cli

That's it. No account, no signup, no API key for the local path. If you want a model with zero keys, pull one with Ollama first:

ollama pull qwen3

Confirm the install resolved:

browserbash --version

You should see 1.3.1 or later printed back.

Two engines, one sentence each

Before the first command, here's the mental model. Both engines read the same English objective and drive the same real browser. What differs is how they interpret your words into browser actions.

Engine Who builds it License How it interprets English Auto-used when
stagehand Browserbase MIT High-level primitives — act, extract, observe, agent — with self-healing built in Default for local, cdp, and browserbase providers
builtin In-repo (BrowserBash) Apache-2.0 (ships with the CLI) An Anthropic tool-use loop driving Playwright directly Forced automatically for lambdatest and browserstack

Read that table twice, because it explains nearly everything that follows. Stagehand is a layer that specializes in turning intent into resilient browser actions and re-finding elements when a page shifts. The builtin engine is a tighter loop: the model is handed Playwright as a set of tools and reasons step by step, and — crucially for debugging — it can emit a Playwright trace when you record a run.

Step 1 — Run the default engine (Stagehand)

Let's establish a baseline. Run any objective without an --engine flag and you get Stagehand, because the default local provider uses it.

browserbash run "Go to https://news.ycombinator.com and tell me the title of the top story"

What you'll see: the agent launches your Chrome, navigates, reads the page, and prints a verdict. A passing run ends with something like a PASSED status and a short summary — for this objective, the extracted title of the current top Hacker News story. The point isn't the story; it's that you ran the default BrowserBash engine without naming it. Stagehand did the interpreting.

If you want to watch it happen instead of running headless, that's the default — the browser window is visible unless you pass --headless.

Step 2 — Switch engines explicitly with --engine

Now run the same objective on the builtin engine. One flag flips it:

browserbash run "Go to https://news.ycombinator.com and tell me the title of the top story" --engine builtin

The verdict should match — same top story, same PASSED. What changed is invisible on this simple task: the builtin engine reasoned through a tool-use loop (navigate, read, extract) rather than calling Stagehand's act/extract primitives. On a one-step read like this, you won't feel the difference. You will on Step 4.

The --engine flag accepts exactly two values:

Flag Engine selected Notes
--engine stagehand Stagehand Same as omitting the flag on local/cdp/browserbase
--engine builtin Builtin tool-use loop Required for trace output; auto-selected on grids

There is no third value. If you mistype it, the run errors out rather than silently falling back — which is the behavior you want in CI.

Step 3 — Why grids auto-use the builtin engine

Here's the gotcha that trips people up. Run against a Selenium grid provider and your --engine choice is overridden:

browserbash run "Open the login page and confirm the username field is present" \
  --provider lambdatest

Even though you didn't pass --engine builtin, that's what runs. The same is true for BrowserStack:

browserbash run "Open the login page and confirm the username field is present" \
  --provider browserstack

Why? Grid providers like LambdaTest and BrowserStack expose remote browsers over a protocol that the builtin engine's Playwright-driven loop talks to directly and reliably. So BrowserBash makes the safe choice for you: lambdatest and browserstack always use the builtin engine. If you explicitly pass --engine stagehand alongside one of those providers, don't expect it to take effect — the grid wins. This isn't a limitation to fight; it's the supported path for cross-browser grid runs. Treat the provider as the thing that decides the engine here.

These two providers need credentials in your environment before the run will start:

Provider Required environment variables
lambdatest LT_USERNAME, LT_ACCESS_KEY
browserstack BROWSERSTACK_USERNAME, BROWSERSTACK_ACCESS_KEY

Set those, and the same English objective you wrote for local runs executes on a remote grid browser — with the builtin engine doing the interpreting whether you asked for it or not.

Step 4 — Run a real multi-step flow on each engine

Single reads don't separate the engines. Multi-step flows do. Let's give both engines something with state — navigate, search, read a result back — and compare.

4a — Stagehand on the flow

browserbash run "Go to https://www.saucedemo.com, log in as standard_user with password secret_sauce, add the first product to the cart, and confirm the cart badge shows 1"

Stagehand's self-healing earns its keep here. When the page re-renders after login, Stagehand re-observes the DOM and re-finds the cart button rather than clutching a stale reference. A clean run ends PASSED with a summary noting the cart badge read 1.

4b — Builtin on the same flow

browserbash run "Go to https://www.saucedemo.com, log in as standard_user with password secret_sauce, add the first product to the cart, and confirm the cart badge shows 1" \
  --engine builtin

Same objective, builtin engine. The tool-use loop will navigate, fill the login fields, click add-to-cart, and read the badge — narrating each tool call as a step. You'll typically see a PASSED here too. The reason to prefer builtin for a flow like this isn't the verdict — it's what you can extract afterward, which is the next step.

A quick honesty note that matters for both engines: very small local models (8B or under) get flaky on long multi-step objectives. They lose the thread halfway through a five-step flow regardless of engine. If a run like 4a or 4b stalls or wanders, that's almost always the model, not the engine. Move to a mid-size local model (Qwen3 / Llama 3.3 70B-class) or pin a hosted model:

browserbash run "Go to https://www.saucedemo.com, log in as standard_user with password secret_sauce, add the first product to the cart, and confirm the cart badge shows 1" \
  --engine builtin \
  --model openrouter/meta-llama/llama-3.3-70b-instruct

(That last command needs OPENROUTER_API_KEY set. The point stands for any capable backend.)

Step 5 — Get a Playwright trace out of a builtin run

This is the builtin engine's superpower for debugging. Add --record and the builtin engine writes a Playwright trace alongside the usual screenshot and .webm session video.

browserbash run "Go to https://www.saucedemo.com, log in as standard_user with password secret_sauce, add the first product to the cart, and confirm the cart badge shows 1" \
  --engine builtin \
  --record

What --record produces:

Open the trace the way you'd open any Playwright trace, pointing the viewer at the trace file the run wrote:

npx playwright show-trace <path-to-trace-file-from-the-run-output>

The run prints where it saved artifacts, and every run is also kept on disk at ~/.browserbash/runs (secrets masked, capped at the most recent 200). When a builtin flow fails on step three of five, this trace is how you find out whether it was the login, the add-to-cart click, or the badge read — without re-running blind.

Here's the rule worth tattooing on your CI config: if you want a Playwright trace, use --engine builtin. Stagehand runs still give you the screenshot and .webm video with --record; the Playwright trace specifically is a builtin-engine output.

The flags that matter for this lesson

Flag What it does Engine note
--engine stagehand|builtin Picks the interpreter Ignored on lambdatest/browserstack (always builtin)
--provider Picks where the browser runs lambdatest/browserstack force builtin
--model Pins the LLM backend (e.g. ollama/qwen3) Same syntax on both engines
--record Screenshot + .webm video; builtin also writes a Playwright trace Trace is builtin-only
--headless Hides the browser window Both engines
--timeout <seconds> Caps run duration Both engines
--agent NDJSON output, one JSON object per line Both engines

Step 6 — Read both engines in CI with --agent

In a pipeline you don't want to parse prose. Add --agent and either engine emits NDJSON — one JSON object per line — which is identical in shape regardless of engine, so your CI step doesn't care which interpreter ran.

browserbash run "Go to https://www.saucedemo.com, log in as standard_user with password secret_sauce, and confirm the inventory page loads" \
  --engine builtin \
  --agent

You'll get progress lines like {"type":"step","step":1,"status":"passed","action":"navigate","remark":"..."} followed by a terminal line such as {"type":"run_end","status":"passed","summary":"...","final_state":{...},"duration_ms":...}. Exit codes make the gate trivial: 0 passed, 1 failed, 2 error, 3 timeout. Because the NDJSON contract is engine-agnostic, you can switch an objective from Stagehand to builtin (or let a grid provider force builtin) without touching the script that reads its output. That's the whole reason --agent exists — CI and AI coding agents consume it, no prose parsing required.

Troubleshooting

Real failure modes you'll hit, and how to clear them.

When to use this

A short decision rule you can keep:

Keep going from here:

FAQ

What is the difference between the Stagehand and builtin engine in BrowserBash?

The Stagehand engine, built by Browserbase under an MIT license, interprets your English using high-level act, extract, observe, and agent primitives with self-healing, and it is the default. The builtin engine is an in-repo Anthropic tool-use loop that drives Playwright directly. Both read the same objective and drive the same real browser; they differ in how they turn your words into actions and in what artifacts they can produce.

How do I switch the BrowserBash engine?

Add the --engine flag to any run command, with either stagehand or builtin as the value, for example browserbash run "..." --engine builtin. Omitting the flag gives you Stagehand on local, cdp, and browserbase providers. There is no third value, and a typo will error the run rather than silently falling back.

Why does BrowserBash use the builtin engine on LambdaTest and BrowserStack?

Grid providers expose remote browsers over a protocol that the builtin engine's Playwright-driven loop talks to directly and reliably, so BrowserBash forces the builtin engine on lambdatest and browserstack runs. This happens automatically even if you pass --engine stagehand, because the provider decides the engine on grids. It is the supported path for cross-browser grid runs, not a limitation to work around.

How do I get a Playwright trace from BrowserBash?

Run with the builtin engine and add --record, for example browserbash run "..." --engine builtin --record. On the builtin engine that produces a Playwright trace in addition to the screenshot and .webm session video, and you open it with the Playwright trace viewer. Stagehand runs still record a screenshot and video, but the Playwright trace specifically is a builtin-engine output.

Pick an engine, run a flow, read the verdict. Install it and try both on your own app — no account required:

npm install -g browserbash-cli

Want the optional cloud dashboard later? Sign up here — it stays optional, and nothing leaves your machine unless you explicitly --upload a run.

Try it on your own appnpm install -g browserbash-cli
Start learning