case study · the testing academy
We pointed BrowserBash at our own Playwright suite
The Testing Academy maintains a production Playwright + TypeScript framework that tests TTACart, our demo store — login, add to cart, full checkout, order confirmation. We took that exact end-to-end journey and rewrote it as one plain-English file, then ran it with a single command on a free local model — driving a real browser against the live app, the whole session recorded for replay.
data-test selectors to write or maintainthe suite under test
A real, production Playwright framework
AdvancePlaywrightFramework1xis our open-source, batteries-included Playwright + TypeScript suite: the Page Object Model, custom fixtures, Faker data factories, a Winston logger, Allure plus a custom TTA-branded HTML reporter, and a GitHub Actions pipeline. It exercises TTACart — a SauceDemo-style store at app.thetestingacademy.com/playwright/ttacart — across login, inventory, cart and a three-step checkout.
- login.spec.ts — signs in as
standard_userand asserts the form is gone. - e2e-checkout.spec.ts — the flagship: login → add item → cart → checkout → “Thank you for your order!”
- apiTests/ — a serial CRUD flow (token → create → update) that runs green in CI.
- Recorded by design — Playwright video on, per-step screenshots, traces on retry.
This isn’t a toy. It’s a maintained suite that passes in CI. We wanted to know: could the same coverage be written by anyone on the team in plain English — and still drive the real browser? It can.
the transformation
From six page objects to one paragraph
Left: the real e2e-checkout.spec.ts — page objects, fixtures, selectors, assertions. Right: the exact same journey as examples/ttacart_checkout_test.md, shipped in the BrowserBash repo. No selectors. No page objects. Just intent.
// e2e-checkout.spec.ts · AdvancePlaywrightFramework1x
test.beforeEach(async ({ loginPage }) => {
await loginPage.open();
await loginPage.loginAs(
credentials.standardUser, credentials.password);
});
test('should complete checkout successfully', async ({
inventoryPage, cartPage, checkoutStepOnePage,
checkoutStepTwoPage, checkoutCompletePage,
}) => {
const customer = DataGenerator.checkoutCustomer();
await inventoryPage.open();
await inventoryPage.addToCart('test-allthethings-tshirt-red');
await cartPage.open();
expect(await cartPage.rowCount()).toBe(1);
await cartPage.checkout();
await checkoutStepOnePage.fillGuest(customer);
await checkoutStepOnePage.continue();
await checkoutStepTwoPage.finish();
await checkoutCompletePage.assertOrderComplete();
});
// + 6 page objects (Login, Inventory, Cart, CheckoutStepOne,
// CheckoutStepTwo, CheckoutComplete), fixtures, BasePage,
// UtilElementLocator, data-test selectors, Faker factories…# TTACart end-to-end checkout - Open the TTACart login page - Log in as standard_user with the password tta_secret - Go to the products inventory page - Add the "Test.allTheThings() T-Shirt (Red)" to the cart - Open the cart and verify it contains exactly 1 item - Click Checkout - Fill the checkout details: first name Pramod, last name Dutta, postal code 560001 - Continue to the order overview, then click Finish - Verify the page shows "Thank you for your order!"
Credentials shown here are TTACart’s public demo creds. For real apps, pass values as {{variables}} — BrowserBash masks them as ***** in every log line, event and summary.
how we ran it
One command runs the whole journey
No build step, no page.locator, no waiting code. The AI agent reads each line, finds the element on the live page, acts, and keeps the logged-in session alive from the first step to the confirmation screen.
$ browserbash testmd run \ examples/ttacart_checkout_test.md --record --upload Engine: stagehand (MIT, stagehand.dev) Recording session video (--record) → opening app.thetestingacademy.com/playwright/ttacart in local Chromium → login · inventory · add to cart · checkout · finish → one browser context across every step · recorded with --record
Write the intent
Plain-English steps in a committable *_test.md — or generate them from an existing spec.
Run one command
Local Chrome by default. Add --provider lambdatest or browserstack for a grid.
Drive a real browser
The agent finds elements live — the session stays logged in through checkout.
Get a verdict + replay
An exit code, a Result.md, and a recorded video — uploaded to your dashboard.
recorded, on a real browser
Plain English in, real TTACart out
With --record, BrowserBash captures a session video and a screenshot on every engine (the builtin engine also saves a Playwright trace). Below is the actual frame BrowserBash captured driving a local Chromium against the live TTACart — not a mockup, the real app.
--record capture: BrowserBash opening TTACart in a local Chromium. The CLI keeps everything on your machine until you add --upload.the dashboard
Every run, recorded and replayable
Add --upload (after a one-time browserbash connect) and the run streams to your free BrowserBash dashboard: run history, status, the video replay, and a per-run page you can share with the team. This is the “showcase in a dashboard” part — a living record of every TTACart journey, not a wall of CI logs.
- Run history — every objective, with pass / fail and duration.
- Video replay — watch exactly what the agent saw, step by step.
- Per-run share link — send a teammate the replay, not a stack trace.
- Free tier — uploaded runs kept 15 days; optional retention for longer.
when something breaks
A failure tells you exactly where
Tests exist to catch regressions, so the failure path matters as much as the happy path. If TTACart ever stopped saying “Thank you for your order!”, BrowserBash marks that step failed, captures a screenshot at the point of failure, writes the reason to Result.md, and exits non-zero so CI goes red — no prose to parse.
✓ [5] act: click Finish ✗ [6] verify the page shows "Thank you for your order!" FAILED — expected text not found · screenshot saved exit code 1 · Result.md written · CI fails the build
| Exit code | Meaning |
|---|---|
0 | Passed — every step succeeded |
1 | Failed — an assertion or step did not pass |
2 | Error — the run could not execute |
3 | Timeout — the run exceeded its budget |
the model & the cost
Free locally — or pennies on a hosted model
BrowserBash is model-agnostic and resolves in this order: a local Ollama model first, then your Anthropic or OpenRouter key if set. For TTACart we ran a local model — $0, fully private, nothing leaving the machine.
- Free & local —
ollama pull qwen3, then run. Best for short, direct flows; no keys, no cost. - Cheap & hosted — long multi-step journeys are most reliable on a stronger model. A budget OpenRouter model like
deepseek/deepseek-chator a Qwen model costs a few cents per run. - One flag to switch —
--model openrouter/deepseek/deepseek-chator--model ollama/qwen3. Same test file, your choice of brain. - No lock-in — swap models or grids without touching the test.
Tiny local models are great for simple objectives; a full login-to-checkout journey is more reliable on a stronger (still cheap) hosted model. BrowserBash lets you pick per run — start free, scale up only when the flow demands it.
Run it on your own app
Install the CLI and turn your next test into a sentence.
npm install -g browserbash-cli