AI Tools for SDETs: Augment Your Test Automation

The best AI tools for SDETs are not the ones that promise to replace your test framework. They are the ones that absorb the work your framework makes expensive: the brittle locators, the one-off checks you never got around to scripting, the smoke tests a product manager wishes they could read. If you are a Software Development Engineer in Test who already owns a Playwright or Selenium suite, the interesting question is not "should I switch?" It is "what should I keep in code, and what should I hand to an AI agent that drives a real browser from a plain-English sentence?" This article answers that with concrete commands, a CI recipe, and a clear line between the two worlds, using BrowserBash, a free, open-source CLI built for exactly this seam.

I am going to be blunt up front: your Playwright tests are an asset, and nothing here suggests you delete them. A large, stable, deterministic regression suite is one of the best things a QA org can own. What I will argue is that there is a band of work — new coverage you need today, UIs that churn weekly, exploratory passes, executable documentation — where writing and maintaining selectors is pure overhead, and that is the band where an AI agent earns its place beside your framework rather than against it.

What changed for SDETs

The SDET role has always been a translation job. A product manager says "users should be able to reset their password and log back in." You translate that intent into page.getByRole('textbox', { name: 'Email' }), a sequence of clicks, an assertion, and a wait strategy that survives a slow network. The test is correct, it is fast, and it is fragile in one specific way: it is welded to the structure of the page, not to the intent of the user. Rename a data-testid, move a button into a dropdown, regenerate a CSS class, and a green feature turns red for reasons that have nothing to do with whether the product works.

For a decade the answer to that fragility was discipline: role-based locators, data-testid conventions, page object models, fixtures, and code review. That discipline works, and it also has a carrying cost that compounds with every screen you cover. Modern AI tools for SDETs change the economics of that cost. Instead of translating intent into structure, you can now write the intent itself — "reset the password and log back in" — and let an agent read the live page and find the elements the way a human would, on every run. The structure is rediscovered each time, so churn in the DOM does not automatically break the test.

This is not magic, and it is not free of trade-offs. An agent driving a browser through a language model is slower than a direct protocol command, and it is non-deterministic in a way a scripted click is not. Those properties make it a poor fit for an 800-test regression wall and a great fit for the work that wall was never good at. The skill that matters now is knowing which is which.

Where an AI agent fits next to Playwright and Selenium

Think of your testing surface as three layers, and assign each layer to the tool that is cheapest for it.

The regression wall stays in code. Hundreds of deterministic checks, sharded across CI workers, finishing in minutes — this is precisely what Playwright and Selenium are built for. Direct protocol commands are milliseconds, not seconds, and a scripted failure reproduces identically every time, which is what makes a CI gate trustworthy. Do not move this to an agent. You would trade speed and determinism for nothing.

New and high-churn coverage goes to the agent. A feature shipped this morning, a regression slipped through this afternoon, and the test you wish you had does not exist yet because writing the locators was never the fast path. This is where authoring speed wins. You describe the flow in a sentence, the agent runs it against a real browser, and you have coverage in the time it takes to type the objective. When the UI is still moving weekly, the agent's "re-read the page every run" behavior means you are not patching selectors on a feature that is not even stable yet.

Exploratory and acceptance work is a natural fit too. A bug-bash hypothesis ("can a logged-out user reach the billing page?"), a smoke test a product manager should be able to review in a pull request, an acceptance check phrased exactly the way the ticket was written — these read like documentation and run like tests. The intent and the test live in the same language for once.

The practical rule of thumb: if a flow will run forever and must be fast and identical every time, keep it in your framework. If it is new, churning, exploratory, or meant to be read by a non-engineer, hand it to the agent. Most teams end up with both, wired into the same pipeline, which is the whole point.

How BrowserBash works, briefly

BrowserBash is a natural-language browser automation CLI. You write a plain-English objective, an AI agent drives a real Chrome or Chromium browser, and you get back a verdict plus structured results. There are no selectors and no page objects because the agent reads the page on each run. Install it once from npm:

npm install -g browserbash-cli

A few facts worth knowing as an SDET evaluating it:

Two engines. The default is Stagehand, the MIT-licensed AI browser-automation framework from Browserbase. The second is a builtin engine — an in-repo Anthropic tool-use loop driving Playwright that also captures a Playwright trace when you record. You rarely pick by hand; the default is what you want for most local runs.
Local LLMs first. BrowserBash is Ollama-first, so it runs against a free, local model with no API keys. It auto-detects Ollama, then Anthropic, then OpenRouter, and OpenRouter exposes free models such as openai/gpt-oss-120b:free. You can bring an Anthropic Claude key if you want, but you do not have to spend a cent to start.
Privacy by default. Nothing leaves your machine unless you explicitly pass --upload. For an SDET working against a staging environment behind a VPN, that default matters.

The BrowserBash docs go deeper on each engine and provider; what follows is the part that changes your day-to-day.

Your first agent-driven check

Here is a one-off objective against a public demo site. It launches a real browser, performs the flow, and prints a verdict.

browserbash run "Go to the-internet.herokuapp.com/login, sign in with username tomsmith and password SuperSecretPassword!, and confirm the secure area page loads" --headless

The --headless flag runs Chrome without a visible window, which is what you want in CI or when you do not need to watch. Drop it and you can watch the agent work, which is genuinely useful the first few times you run it because you see exactly how it interprets your sentence.

Notice what is absent: no By.id, no getByRole, no explicit wait. You wrote the intent. The agent found the email field, the password field, and the submit button, then checked that the secure area appeared. If next sprint someone restructures that login form, this command keeps working as long as a human could still log in by looking at the page.

That is the core of what AI tools for SDETs buy you here: the test describes what the user does, not how the DOM is shaped this week.

Committable tests SDETs will actually review

A one-off objective is great for a quick check, but real SDET work lives in version control and goes through pull requests. BrowserBash supports markdown tests for exactly this. You write a *_test.md file where each list item is one step, and the file is committable, reviewable, and diffable like any other artifact.

# Login smoke test

- Go to the-internet.herokuapp.com/login
- Type {{username}} into the username field
- Type {{password}} into the password field
- Click the login button
- Confirm the page shows "You logged into a secure area!"

Run it with:

browserbash testmd run login_test.md

This writes a Result.md next to the test so the outcome is itself a readable artifact. A few details that matter for a test you keep:

{{variables}} let you parameterize the test, and secret values are masked in output as *****, so a password never leaks into logs or the result file.
@import composes shared steps from other files, so a common "log in" sequence lives in one place and every test that needs it imports it. This is the agent-world equivalent of a page object: reuse without the maintenance tax, because the underlying steps are still plain English.

Because the file reads like a checklist, the product manager who owns the feature can review it in the same pull request as the code change and actually understand what it verifies. That review-ability is half the value. A Selenium page object is legible to engineers who know the codebase; a markdown test is legible to everyone who can read the ticket.

Wiring it into CI with no prose parsing

The thing that makes an AI tool usable in a real pipeline is a machine-readable contract, not pretty terminal output. BrowserBash's agent mode is built for this. Add --agent and the run emits NDJSON — one JSON event per line on a stable schema — instead of prose. Your CI job, or an AI coding agent orchestrating the run, consumes structured events without scraping human text.

browserbash run "Add a product to the cart and verify the cart count increments to 1" --agent --headless

Exit codes are explicit and stable, which is what a CI gate actually keys on:

0 — passed
1 — failed
2 — error
3 — timeout

That means a GitHub Actions or GitLab CI step can branch on the exit code directly, the same way it would for a Playwright run, while the NDJSON stream gives you per-step detail for logs and dashboards. No regex against console output, no flaky string matching. For an SDET who has fought a flaky log parser, this is the detail that makes agent-driven checks trustworthy enough to gate a merge. There is a deeper write-up of the NDJSON event model and CI patterns on the BrowserBash blog.

Recording evidence: screenshots, video, and traces

When an agent-driven check fails in CI, you want to see what happened, not guess. Pass --record and BrowserBash captures a screenshot and a session video — a .webm stitched together with ffmpeg — on any engine. If you are running the builtin engine, recording additionally captures a Playwright trace, which drops you straight into the time-travel debugger SDETs already know from Playwright.

browserbash run "Complete checkout as a guest and verify the order confirmation page" --record --headless

This closes a real gap. The classic objection to agent-driven testing is "I can't debug what I can't see." A failing run that hands you a video of the browser and, on the builtin engine, a Playwright trace is as debuggable as anything in your existing toolkit — arguably more, because you are watching the actual rendered flow rather than reconstructing it from a stack trace.

Running on a cross-browser grid

SDETs rarely get to test on one browser. The good news is that switching where the browser runs is a single flag. BrowserBash supports several providers: local (your own Chrome, the default), cdp (any DevTools endpoint), browserbase, lambdatest, and browserstack. The objective and the test file do not change — only the destination.

browserbash run "Log in and verify the account dashboard renders" --provider lambdatest --headless

The same plain-English objective now runs on a cloud grid instead of your laptop, against whatever browser and OS combination that grid provides. You write the intent once and retarget it, which is the same portability promise your framework's grid integration makes, without the per-browser locator quirks that sometimes leak into scripted suites.

A dashboard for run history, when you want one

Local runs are private by default, but sometimes you want history, replay, and a place for the team to look. BrowserBash gives you two options without forcing anything off your machine.

There is a free, private local dashboard:

browserbash dashboard

And there is a cloud dashboard for shared run history and per-run replay. Create a free account, connect once, then add --upload to any run to push it:

browserbash connect --key bb_your_key_here
browserbash run "Search for a product and verify results appear" --upload --headless

The cloud dashboard keeps run history, recordings, and per-run replay so the team can review a failure without re-running it locally. Nothing uploads unless you pass --upload, which keeps the privacy default intact for the runs you want to keep on your own machine. On the free tier, cloud runs are retained for 15 days, which is plenty for the recent-history view most teams actually use.

A realistic SDET workflow that uses both

Putting it together, here is how an SDET might actually split the work across a sprint, without abandoning Playwright or Selenium for a moment.

A new feature lands mid-sprint. Rather than block on writing locators, you write three plain-English markdown tests for the happy path and two edge cases, commit them, and wire them into CI with --agent so the exit code gates the branch. The product owner reviews the markdown in the pull request and confirms the tests match the acceptance criteria. The feature ships with coverage from day one.

A week later the UI for that feature gets a redesign. Your scripted regression tests for older, stable features keep running untouched in Playwright, because those screens have not moved. The new feature's markdown tests survive the redesign because the agent re-reads the page — no selector patching, no red build from a renamed class. Once the feature stabilizes and you know it will run forever, you can promote the most critical of those flows into a deterministic Playwright test for speed, treating the markdown version as the executable spec it was written from.

That is the shape of using AI tools for SDETs well: the agent absorbs the volatile, the new, and the human-readable; the framework owns the stable, the fast, and the deterministic. You can read more patterns like this in the BrowserBash learn hub, and the CLI itself is on npm if you want to try the commands above right now.

What this does not change

It is worth being honest about the limits, because credibility matters more than hype. An AI agent is slower per step than a scripted action, so it will never be the tool for a thousand-test regression sweep on every commit. It is non-deterministic, so two runs of the same objective can take slightly different paths to the same verdict, and you should treat a flaky agent run the way you would treat any flaky test — investigate, do not just retry blindly. And it depends on a model, so the quality of your verdicts tracks the quality of the model you point it at; a tiny local Ollama model is fine for a smoke check and may struggle with a subtle assertion.

None of those are reasons to avoid AI tools for SDETs. They are reasons to scope them correctly. Used inside the band of work they are good at — new coverage, high-churn UIs, exploratory passes, readable acceptance tests — they remove real, recurring overhead. Used as a drop-in replacement for a fast deterministic suite, they will disappoint. The SDET skill, as always, is putting the right work in the right place.

FAQ

Will an AI testing tool replace Playwright or Selenium for an SDET?

No, and you should be skeptical of anything that claims it will. Playwright and Selenium remain the right tools for large, fast, deterministic regression suites, and BrowserBash does not try to take that over. The realistic use is complementary: the agent handles new, high-churn, exploratory, and human-readable tests, while your framework owns the stable regression wall. Most teams run both in the same pipeline.

How does BrowserBash handle flaky tests caused by changing selectors?

It sidesteps that class of flakiness by not using fixed selectors at all. The agent re-reads the live page on every run and locates elements the way a person would, so a renamed data-testid or a regenerated CSS class does not break the test as long as a human could still complete the flow by looking at the page. This is the main reason it fits high-churn UIs that would otherwise generate constant selector-maintenance work.

Can BrowserBash run in CI like my existing automation?

Yes, and it is designed for it. Run with --agent and it emits NDJSON on a stable schema instead of prose, and it returns explicit exit codes (0 passed, 1 failed, 2 error, 3 timeout) that a CI gate can branch on directly. Add --record to capture a screenshot and session video for failures, and --headless to run without a visible browser window.

Is BrowserBash actually free to use?

Yes. BrowserBash is free and open source under the Apache-2.0 license, and it is Ollama-first, so it runs against a free local model with no API keys required. There are no paid tiers to start, and nothing leaves your machine unless you explicitly pass --upload to push a run to the cloud dashboard.

Try it on your own suite

If you own a Playwright or Selenium suite and you have been patching selectors instead of adding coverage, point BrowserBash at the gap. It is free and open source, it runs locally with no API keys, and you can be running your first plain-English check in the time it took to read this. Create a free account to unlock the cloud dashboard with run history and per-run replay, install the CLI from npm, and let the agent absorb the work your framework was never the fast path for.