Exploratory Testing With an AI Agent, No Scripts

Exploratory testing has a tooling gap. Automation gets frameworks, CI budgets, and dashboards; exploration gets a charter doc and whatever you can click through in an afternoon. The compromise most QA engineers land on is the throwaway script — twenty minutes writing Playwright to check one thing, then deleting it. BrowserBash collapses that loop: a one-shot browserbash run with a plain-English objective is faster to type than the script you were going to throw away anyway.

Release week, illustrated

The scenario below is illustrative — a composite release week, framed so you can steal the workflow — but every command runs exactly as printed. Picture the lone QA engineer on a small e-commerce team. The release cuts Thursday. The charter list has 20 items: cart edge cases, pricing displays, login oddities. Historically, five of those items needed automation help, each became a 20–40 line scratch script, and all five were dead code by Friday.

One-shot objectives instead of scratch scripts

Each charter item becomes one command. A structural check that mixes an assertion with an extraction:

browserbash run "Open https://quotes.toscrape.com, verify there are exactly 10 quotes on the first page, and store the first quote's author as 'author'" --headless

If the verify clause is false, the run fails with exit code 1 — the check is the test. A multi-step flow, with credentials kept out of the command string:

browserbash run "Open {{base_url}}, log in as {{username}} with password {{password}}, add the Sauce Labs Backpack to the cart, and store the cart badge count as 'cart_count'" \
  --headless \
  --variables '{"base_url":"https://www.saucedemo.com","username":"standard_user","password":{"value":"secret_sauce","secret":true}}'

And a check that needs actual reasoning over the page — the kind that is miserable to script by hand:

browserbash run "Open https://books.toscrape.com, navigate to the Travel category, store the number of books shown as 'count' and the price of the most expensive book on the page as 'max_price'" --headless --timeout 120

Finding a maximum means the agent reads every price on the page. Give deep checks --timeout headroom, and raise --max-steps past the default 30 when a flow is long.

store-as is the evidence locker

The habit that makes this exploratory testing rather than casual poking: end every objective with store ... as 'name'. Anything phrased that way lands in the run's final state, which turns each check into a record instead of a memory. Add --agent and stdout becomes NDJSON you can append to a session log:

browserbash run "Open {{base_url}}, log in as {{username}} with password {{password}}, add the Sauce Labs Backpack to the cart, and store the cart badge count as 'cart_count'" \
  --agent --headless \
  --variables '{"base_url":"https://www.saucedemo.com","username":"standard_user","password":{"value":"secret_sauce","secret":true}}' \
  | tee -a release-notes.ndjson

The last line of every run is a run_end event carrying the verdict, a one-paragraph summary, the stored values, and the duration in milliseconds — paste it into the bug ticket as-is. Values marked "secret": true stay masked as ***** even in NDJSON, so the log is safe to attach.

What two afternoons produced

In the illustrative week: 14 one-shot checks across two afternoons, three real finds. A cart badge that read 2 after adding a single item when the session arrived through a promo link. A category page whose "highest price first" sort disagreed with the stored max_price. A login that bounced through a redirect loop on a stale session cookie. Each ticket included the exact command — rerunnable by any developer — and the run_end line as evidence. Scratch scripts left behind to maintain: zero.

Promote the keepers

Two of the 14 checks were worth keeping. Graduating a one-shot into a committable test is a copy-paste:

cat > cart_badge_test.md <<'EOF'
# Cart badge smoke

- Open {{base_url}}
- Log in as {{username}} with password {{password}}
- Add the Sauce Labs Backpack to the cart
- Verify the cart badge shows exactly 1
- Store the cart badge count as 'cart_count'
EOF

browserbash testmd run ./cart_badge_test.md --headless

After the run, a Result.md report appears next to the test file with the verdict and extracted values. The check you explored your way into on Tuesday is a reviewable regression gate by Thursday.

FAQ

Isn't exploratory testing supposed to be a human activity?

It still is. The charter, the hunches, and the "that number looks wrong" judgment stay human — the agent removes the typing between hunch and evidence. You steer one objective at a time, read the result, and decide what to probe next.

How do I keep evidence from runs?

Three layers. Phrase extractions as store ... as 'name' so values land in the final state. Add --agent and append the NDJSON stream to a session log. And for promoted *_test.md files, every browserbash testmd run writes a Result.md next to the test.

What about dynamic pages that change between runs?

Anchor assertions on invariants — counts, presence, relationships — and use store for volatile values instead of asserting exact text: store the top story title rather than insisting on what it says. For slower or deeper pages, raise --timeout and --max-steps.