guide· 14 min read· by Pramod Dutta

NDJSON Agent Mode: Make Browser Runs Callable by Scripts and AI

Learn BrowserBash NDJSON agent mode: the --agent stdout schema, exit codes 0-3, and how to call browser runs from scripts and AI agents.

Most browser automation tools were built for humans to read. You launch a run, watch a wall of colored text scroll by, and a person decides whether it passed. That falls apart the moment the caller is a shell script, a CI job, or an AI coding agent — none of them should be parsing prose to learn whether a login worked. BrowserBash's NDJSON agent mode exists to remove that guesswork: add --agent to any run and stdout becomes a stream of newline-delimited JSON with a stable schema, while the process exit code carries the verdict. A browser run stops being a thing you watch and becomes a thing you can call like a function.

This is a hands-on tutorial. By the end you will know every field the schema emits, what each of the four exit codes means and how to react to it, and how to wire a real run into a bash + jq loop, a CI job, and an AI agent — without a single regex against human-readable output.

Why NDJSON, and why a verdict in the exit code

BrowserBash is a free, open-source natural-language browser automation CLI. You write a plain-English objective, an AI agent drives a real Chrome or Chromium browser, and you get back a verdict plus structured results — no selectors, no page objects. In interactive use it prints friendly, human-readable progress. Agent mode flips that contract for machine callers.

NDJSON — newline-delimited JSON — means one complete JSON object per line, terminated by a newline. It is the right shape for a long-running process for two reasons:

Pairing NDJSON with an exit-code verdict is the second half of the contract. Prose was never an interface — log formats drift between releases, a summary line gets reworded, and a brittle grep "0 failures" silently falls through and reports a false green. By putting the verdict in the exit code, BrowserBash makes the integration robust: the caller never infers success from text. It reads a number.

Turning on agent mode

Agent mode is a single flag on the commands you already run. Here is the canonical invocation:

browserbash run "Open https://example.com/login, log in as standard_user with the team password, and store the logged-in display name as 'user_name'" \
  --agent \
  --headless \
  --timeout 120

Three flags are doing the work:

The most important behavioral detail in agent mode: stdout carries only NDJSON; everything human-readable goes to stderr. That separation is what lets you redirect a clean machine stream to a file while still seeing readable progress in your terminal or CI log:

browserbash run "Open https://example.com and store the page title as 'title'" \
  --agent --headless > run.ndjson
# run.ndjson is pure NDJSON; friendly logs still print to the console via stderr

If you forget the redirect, the two streams stay correctly separated anyway — NDJSON to stdout, prose to stderr — so piping stdout into jq never trips over a stray log line.

The NDJSON schema, field by field

Agent mode emits two kinds of events on stdout: zero or more step events while the run is in flight, then exactly one terminal run_end event. Every line is a complete JSON object and every line has a type field, so a consumer can branch on type and never guess.

Step events

As the agent acts on the browser, it streams one step event per action:

{"type":"step","step":3,"status":"running","action":"click","remark":"Clicking the Login button (ref:12)"}

The fields:

Because step events arrive live, you can tee them straight into a structured log. A coding agent can keep a running trail of exactly which actions ran, which is invaluable when a teammate later asks "what did the bot actually click?"

The run_end event

The final line is always a single terminal event that summarizes the whole run:

{
  "type": "run_end",
  "status": "passed",
  "summary": "Logged in successfully and captured the display name.",
  "final_state": {"user_name": "Q. Tester"},
  "duration_ms": 48211,
  "steps_executed": 9,
  "provider": "local",
  "test_url": null
}

The fields you will actually consume:

A practical rule for designing objectives around this schema: phrase every piece of data you need as store ... as 'some_name'. That is the contract that fills final_state, and final_state is the clean, typed channel back to your script. Everything else in run_end is metadata about the run itself.

Exit codes are the real API

The schema tells you what happened; the exit code tells you what to do about it. Every browserbash run and browserbash testmd run exits with one of four codes, and that single number is the contract a caller should branch on:

Exit code status Meaning
0 passed The objective held.
1 failed A real failure — the objective or a verify step did not hold.
2 error Infrastructure or agent problem (bad endpoint, grid hiccup, misconfiguration).
3 timeout The run exceeded its --timeout budget.

The granularity is the entire point, and the right response differs per code:

Collapsing 1 and 2 into a single "it failed" is the most common mistake, and it is corrosive: it trains your team to rerun genuine product failures as if they were flakes. Keep them distinct.

Build along: a bash + jq loop

Here is the smallest useful integration. Run an objective, capture the verdict from the exit code, and pull a stored value out of final_state:

out=$(browserbash run "Open https://example.com and store the page title as 'title'" --agent --headless)
code=$?
title=$(echo "$out" | tail -1 | jq -r '.final_state.title')
echo "exit=$code title=$title"

Three things make this robust. $? reads the verdict straight from the exit code. tail -1 isolates the always-last run_end line. jq -r '.final_state.title' extracts exactly the value you stored — no prose, no regex.

Scaling that up to a production-shaped wrapper that handles secrets and reacts per exit code:

out=$(browserbash run "Open https://staging.example.com/login, log in as {{username}} with password {{password}}, and store the logged-in display name as 'user_name'" \
  --agent --headless --timeout 120 \
  --variables '{"username":"qa@example.com","password":{"value":"hunter2","secret":true}}')
code=$?

# A structured trail of every action, for the run log:
echo "$out" | jq -c 'select(.type=="step")'

summary=$(echo "$out" | tail -1 | jq -r '.summary')

case $code in
  0) echo "PASS: $summary" ;;
  1) echo "FAIL: $summary — investigate the change" ;;
  2) echo "ERROR: $summary — retrying once" ;;
  3) echo "TIMEOUT: $summary — raise --timeout or split the objective" ;;
esac

Notice how credentials are passed. They ride in --variables as an object with "secret": true rather than being typed inline in the objective. Secret values are masked as ***** in the NDJSON stream too, which matters the moment an agent transcript or CI log gets archived verbatim. For more on the variables-and-secrets model, see the BrowserBash learn pages.

Build along: an AI coding agent that verifies its own work

This is the use case agent mode was named for. An AI coding agent can write the fix but cannot, on its own, see whether the page still renders. Agent mode closes that loop: the agent invokes browserbash run as a tool, branches on the exit code, and reads structured results from final_state — exactly the kind of clean tool interface an LLM is good at consuming, with no prose to misread.

A house ruleset that works well for agent callers:

An agent can also drive a browser it already controls. If your agent launched Chrome over the DevTools Protocol — for example through a Playwright-based MCP server — point BrowserBash at that same endpoint instead of letting it launch its own:

browserbash run "Verify the dashboard shows a welcome banner and store its text as 'banner'" \
  --agent --headless \
  --provider cdp --cdp-endpoint ws://localhost:9222/devtools/browser/<id>

BrowserBash then drives the existing session rather than spawning a fresh browser — useful when the agent has already authenticated or navigated somewhere it wants verified.

Build along: committable markdown tests

Agent mode is not limited to one-line objectives. BrowserBash also runs markdown tests — committable *_test.md files where each list item is a step, @import composes shared steps, and {{variables}} interpolate with the same secret masking. Run one with --agent and you get the identical NDJSON contract and exit codes, plus a written Result.md:

browserbash testmd run checkout_test.md --agent --headless --timeout 180 > checkout.ndjson
code=$?
tail -1 checkout.ndjson | jq -r '.status, .duration_ms, .final_state.order_id'
exit $code

This is the natural unit for CI: a reviewable test file lives next to your code, and the run emits a machine stream plus an exit-code verdict with no parsing step in between. Because the artifact is plain NDJSON, you can keep every run and mine duration_ms later to catch flakiness before it ever turns a build red.

Wiring it into CI

In a CI job, the exit code is your pass/fail gate — there is no "parse results" step to write, because the run step fails exactly when the test fails. Here is a GitHub Actions job that installs the CLI, runs a markdown test in agent mode, and always uploads the NDJSON artifact for debugging:

name: e2e
on: [push]
jobs:
  smoke:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm install -g browserbash-cli
      - run: browserbash testmd run checkout_test.md --agent --headless --timeout 180 > checkout.ndjson
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: checkout-ndjson
          path: checkout.ndjson

A couple of refinements turn this from working into production-grade. First, retry only the environment exit codes, never 1:

browserbash testmd run checkout_test.md --agent --headless --timeout 180 > checkout.ndjson
code=$?
if [ "$code" -eq 2 ] || [ "$code" -eq 3 ]; then
  echo "environment-flavored exit ($code) — retrying once" >&2
  browserbash testmd run checkout_test.md --agent --headless --timeout 180 > checkout.ndjson
  code=$?
fi
exit $code

Second, when you want a real browser farm and session replays rather than the CI runner's local Chrome, switch where the browser runs with a single flag:

browserbash testmd run checkout_test.md --agent --headless --provider lambdatest

That sends the run to a cloud grid; the resulting run_end event carries a test_url deep-linking to the recorded session, while the verdict and schema stay byte-for-byte identical. The provider is the one thing that changes — your consuming code does not.

Recording and pushing runs to a dashboard

Agent mode and recording compose cleanly. Add --record to capture a screenshot and a stitched session video (.webm) alongside the NDJSON, which is exactly what you want attached to a failing run:

browserbash run "Open https://example.com/checkout and complete a guest purchase" \
  --agent --headless --record

If you have created a free account and connected the CLI, add --upload to push the run to the cloud dashboard for run history and per-run replay:

browserbash connect --key bb_your_key_here
browserbash run "Open https://example.com/checkout and complete a guest purchase" \
  --agent --headless --record --upload

Nothing leaves your machine unless you pass --upload — by default every run is local. There is also a free, private local dashboard via browserbash dashboard if you would rather keep everything on your own machine. For walkthroughs of the dashboard and more end-to-end recipes, browse the BrowserBash blog.

A consumer's checklist

When you write something that calls BrowserBash in agent mode, hold yourself to this short list:

Follow those and a browser run behaves like any other well-mannered function in your toolchain: predictable input, structured output, an unambiguous return code.

FAQ

Why NDJSON instead of a single JSON document?

Because runs are long-lived and you want to react before they finish. NDJSON streams one parseable object per line, so a supervising script or agent can log progress live, spot a stall, and kill a runaway run early. And since the terminal run_end event is always the last line, tail -1 | jq retrieves the verdict without buffering or parsing the entire stream.

How do I tell a real test failure from a tooling failure?

By the exit code. 1 is a genuine failure of the app or the test's expectation — investigate the change and never silently retry it. 2 is an infrastructure or agent error such as a dead endpoint or a grid blip, where a single automatic retry is reasonable. 3 is a timeout, which means raise --timeout or split the objective into smaller runs.

How do I get a value out of a run and use it in the next step?

Capture it in the objective with store ... as 'name', and it appears in run_end.final_state under that key. Because run_end is always the last line, tail -1 run.ndjson | jq -r '.final_state.name' gives you the value to export or pass to a downstream step — no scraping of human-readable output.

Does agent mode work with markdown tests and cloud grids too?

Yes. browserbash testmd run file_test.md --agent emits the identical NDJSON schema and the same four exit codes as browserbash run, and also writes a Result.md. Switching where the browser runs is one flag — for example --provider lambdatest — and the verdict, schema, and your consuming code stay exactly the same; only the provider and test_url fields change.

Get started for free

BrowserBash is free and open source (Apache-2.0). Install it with npm install -g browserbash-cli, add --agent to any run, and you have a browser automation tool your scripts and AI agents can call like a function. To keep run history, recordings, and per-run replays in the cloud, create a free account at browserbash.com/sign-up — it stays free, and nothing leaves your machine until you choose to upload.

Try it on your own appnpm install -g browserbash-cli
Start learning