Midscene Alternatives for AI UI Automation

Name: BrowserBash
Author: The Testing Academy

If you are searching for Midscene alternatives, you have probably already tried letting an AI drive your UI instead of hand-writing selectors, and you have hit one of two walls: either the tool wants you to wire its SDK into your own code, or you are not sure how to make it run unattended in CI. Midscene is a genuinely good library for AI UI automation. It lets you describe what you want in natural language and have a model carry it out against a real page. But "describe it in JavaScript and call an SDK" is one design choice, not the only one, and it does not fit every team. This guide compares the AI UI-automation tools worth evaluating in 2026, including Stagehand, ZeroStep, and Shortest, and shows where a CLI-and-Markdown approach beats an in-code SDK for teams that want runnable tests rather than another dependency in their app.

What Midscene actually is, and why people look for alternatives

Midscene (sometimes written Midscene.js) is an open-source project for AI-driven UI automation. The core idea is that you stop maintaining brittle CSS or XPath selectors and instead tell a model what to do: "click the login button," "type the search term," "assert the cart shows two items." It ships integrations that let you call this from inside test code and automation scripts, and it can use vision-capable models to reason about what is on screen. As of 2026, it has a real following among engineers who want the selector-free experience without leaving their existing JavaScript or TypeScript stack.

So why do people go looking for Midscene alternatives at all? A few recurring reasons show up in real evaluations:

You do not want an SDK in your codebase. Midscene's natural home is inside code you write and own. For a lot of QA teams, especially ones without a dedicated automation engineer, "now write some JavaScript that imports a library and constructs an agent" is exactly the friction they were trying to escape.
You want committable, human-readable tests. A describe/it block full of AI calls is still code. Some teams want a test artifact a non-developer can read, review in a pull request, and reason about without parsing the surrounding harness.
Cost and data-residency. Vision-driven automation can lean on hosted models, and depending on how you configure it, that means tokens and screenshots leaving your network. Teams in regulated environments want a local-first option.
CI ergonomics. Getting structured pass/fail signal and artifacts out of an SDK into a pipeline is doable, but it is plumbing you have to build.

None of these make Midscene a bad tool. They are reasons a different shape might fit your team better. Let's look at the field.

The Midscene alternatives landscape in 2026

The tools that compete with Midscene split into a few camps. There are SDK-style libraries you call from your own code (Stagehand, ZeroStep, Midscene itself). There are AI test runners that fold natural language into a test framework (Shortest). And there is the CLI-and-Markdown camp, where the test is a plain-English objective or a committable Markdown file you run from a command line (BrowserBash). Knowing which camp you want narrows the field fast.

Stagehand

Stagehand is an open-source (MIT) framework from Browserbase that extends Playwright with AI-powered, natural-language actions like act, extract, and observe. Instead of writing a selector, you write page.act("click the sign in button") and a model figures out the rest, falling back to deterministic Playwright when you want it. It is one of the strongest Midscene alternatives if you already live in Playwright and want to keep your existing test structure while sprinkling AI where selectors are painful.

Stagehand's big advantage is that it is incremental. You do not throw away your Playwright suite; you upgrade the flaky parts. Its trade-off is the same one Midscene has: it is a library you import into code you maintain. You are still writing and owning a test program. For teams that want that, Stagehand is excellent. For teams that wanted to stop writing test programs, it is the same shape in a different package. Worth noting: BrowserBash actually uses Stagehand as its default engine under the hood, so this is less "either/or" than it looks, more on that below.

ZeroStep

ZeroStep is an AI plugin for Playwright that exposes an ai() function: inside a Playwright test you call ai("click the checkout button") and a model interprets the instruction against the live page. It is a clean, focused way to add natural-language steps to a Playwright suite without rebuilding it. As a Midscene alternative it sits in the same SDK camp, very tight scope, lives inside your test code.

ZeroStep's pricing and hosted-service details have shifted over time and are not something I will quote from memory; check their current terms directly before you commit, especially around whether instructions are processed by a hosted service. The architectural point that matters for this comparison is stable: ZeroStep is something you call from inside Playwright test code, not a standalone runner you point at an objective. If your team is comfortable in Playwright and just wants a few AI steps, it is a reasonable pick. If you wanted to get out of writing Playwright code, it does not change that.

Shortest

Shortest is an open-source AI test runner where you write tests in natural language and a model executes them, built around the idea that a test like "the user can log in and see their dashboard" should be runnable without you scripting every click. It is closer to the "runnable plain-English test" philosophy than the pure SDKs, which makes it an interesting Midscene alternative for teams who specifically want natural-language tests rather than AI-augmented code.

Shortest's exact feature surface, model support, and project status evolve, so treat specifics as "check the repo" rather than gospel. Conceptually it overlaps a lot with what BrowserBash does, natural-language tests, AI execution, but the packaging differs: where Shortest is a test runner you adopt as a framework, BrowserBash is a CLI plus a committable Markdown format with a local-first model story. If Shortest's framework fits your repo, it is a fine choice. The comparison below is about where each shape shines.

Comparison table: Midscene alternatives at a glance

Here is the honest, high-level shape of each tool. Where a fact is not publicly nailed down, I have said so rather than inventing it.

Tool	Type	License	Authoring model	Local models?	Best for
Midscene	AI UI-automation library	Open source	SDK called from your code	Configurable (model-dependent)	JS/TS teams wanting selector-free actions in their own code
Stagehand	AI + Playwright framework	MIT	SDK (`act`/`extract`/`observe`) in Playwright	Model-dependent	Playwright teams upgrading flaky selectors incrementally
ZeroStep	AI plugin for Playwright	Open source plugin	`ai()` call inside Playwright tests	Check current terms	Adding NL steps to an existing Playwright suite
Shortest	AI natural-language test runner	Open source	NL tests run by a framework	Check current repo	Teams wanting runnable plain-English tests as a framework
BrowserBash	NL automation CLI + Markdown tests	Apache-2.0	CLI objective or committable `*_test.md`	Yes, Ollama-first, $0 on local	Teams wanting runnable tests and CI signal without an SDK

A note on reading this table: "Type" is the most decision-relevant column. If you want a library, Midscene, Stagehand, and ZeroStep are your shortlist. If you want a runner or a CLI, look at Shortest and BrowserBash. The license and local-model columns matter most for regulated teams and anyone trying to keep their model bill at zero.

Where BrowserBash fits: a CLI-and-Markdown alternative

BrowserBash is a free, open-source (Apache-2.0) natural-language browser automation CLI from The Testing Academy. It is deliberately not an SDK. You do not import it into your application code or construct an agent object in JavaScript. You install one binary, write a plain-English objective, and the AI agent drives a real Chrome or Chromium browser step by step, no selectors, no page objects, and returns a verdict plus structured results.

The install and run loop is about as short as it gets:

npm install -g browserbash-cli

browserbash run "log in to the demo store, add a laptop to the cart, \
  complete checkout, and verify the page shows 'Thank you for your order!'"

That single command opens a real browser, reasons through each step, and tells you whether the flow passed. There is no harness to scaffold, no describe block, no SDK call graph. For a QA engineer who wanted to stop writing automation programs, that is the whole pitch.

Two design choices set it apart from the SDK-style Midscene alternatives.

It uses Stagehand under the hood, by default

BrowserBash's default engine is Stagehand (MIT, by Browserbase), with a second builtin engine that runs an in-repo Anthropic tool-use loop. So if you liked Stagehand's reasoning but did not want to write Playwright code around it, BrowserBash gives you that engine wrapped in a CLI. You get the AI-action quality without owning the test program. You switch engines with a flag when you need the builtin loop's extras, like a Playwright trace you can open in the trace viewer.

Tests are committable Markdown, not code

This is the part that genuinely differs from every SDK in this roundup. BrowserBash lets you write *_test.md files where each list item is a step. A non-developer can read it, a reviewer can approve it in a pull request, and it lives in version control next to your app. It supports @import composition so you can reuse a login flow across many tests, and {{variables}} templating so you can parameterize environments and credentials. Secret-marked variables are masked as ***** in every log line, which matters the moment you put a real password in a test.

browserbash testmd run ./checkout_test.md \
  --var baseUrl=https://staging.example.com \
  --secret password=$STORE_PASSWORD

After each run it writes a human-readable Result.md. You learn more about the Markdown format and the broader workflow in the BrowserBash learn docs.

The model story: local-first and a real $0 option

A recurring worry with AI UI automation is the bill, and the data leaving your network. BrowserBash is Ollama-first: by default it uses free local models, no API keys, nothing leaves your machine. It auto-resolves a local Ollama install first, then ANTHROPIC_API_KEY, then OPENROUTER_API_KEY, so you can start fully local and graduate to a hosted model only when you choose to. It supports OpenRouter (including genuinely free hosted models such as openai/gpt-oss-120b:free) and Anthropic's Claude if you bring your own key.

You can guarantee a $0 model bill by running on local models. That is a meaningful difference from tools whose natural path is a hosted vision model.

Now the honest caveat, because credibility beats hype: very small local models, roughly 8B parameters and under, can be flaky on long, multi-step objectives. They lose the thread on a ten-step checkout. The sweet spot is a mid-size local model in the Qwen3 or Llama 3.3 70B class, or a capable hosted model for genuinely hard flows. If your hardware can only run a tiny model and your flows are long, set expectations accordingly or reach for a hosted model on the hard tests. This is true of AI UI automation in general, not just BrowserBash, but it is worth saying plainly.

CI and AI-agent ergonomics: the part SDKs make you build

If you want any of these tools in a pipeline, you need machine-readable output and clean exit codes. With an SDK, that is plumbing you write. BrowserBash ships it.

Agent mode (--agent) emits NDJSON, one JSON event per line, on stdout. Exit codes are unambiguous: 0 passed, 1 failed, 2 error, 3 timeout. There is no prose to parse, which is exactly what you want when the consumer is a CI job or an AI coding agent orchestrating the run.

browserbash run "open the pricing page and verify the Pro plan \
  shows a monthly and an annual price" \
  --agent --headless

Pipe that into a CI step, branch on the exit code, and you have a smoke test without writing a test program. Add --record to capture a screenshot and a full .webm session video via ffmpeg on any engine; on the builtin engine you also get a Playwright trace. That recording story is something you would otherwise assemble by hand around an SDK.

For where the browser actually runs, one --provider flag switches between local (your Chrome, the default), cdp (any DevTools endpoint), browserbase, lambdatest, and browserstack. So you can develop locally and run the same objective on a cloud grid for cross-browser coverage:

browserbash run "complete signup with a fake email and verify the \
  welcome screen appears" --provider lambdatest --record --upload

There is no account required to run anything. An optional, strictly opt-in free cloud dashboard (run history, video recordings, per-run replay) is available via browserbash connect plus --upload, with free uploaded runs kept for 15 days. If you want history without uploading anything, browserbash dashboard gives you a fully local dashboard.

When to choose each tool

Here is the balanced, genuinely-useful part. None of these is the universal answer.

Choose Midscene when

You are a JavaScript or TypeScript team, you want selector-free UI actions inside code you already own, and an SDK in your repo is a feature, not a bug. If you like driving automation from your own program and want fine-grained control over how the agent is constructed and invoked, Midscene is squarely built for you. The CLI approach would feel like a step away from the control you want.

Choose Stagehand when

You already have a Playwright suite and the pain is specific: a handful of flaky selectors, a few flows that break every redesign. Stagehand lets you keep everything and AI-upgrade only the painful parts, with deterministic Playwright fallback. It is the lowest-disruption Midscene alternative for an established Playwright shop. (And remember, if you like Stagehand but not the code, BrowserBash wraps it in a CLI.)

Choose ZeroStep when

You want the smallest possible footprint, just an ai() call inside Playwright tests, and you are fine with the hosted-service model. Confirm current pricing and data handling first. It is a focused tool that does one thing.

Choose Shortest when

You want runnable natural-language tests but you specifically want them as a framework adopted into your repo, and its conventions fit how your team works. It overlaps heavily with BrowserBash's philosophy; the deciding factor is packaging and your repo's shape.

Choose BrowserBash when

You want runnable tests rather than an in-code SDK, you want committable Markdown your whole team can read and review, you care about a local-first model story with a real $0 path, and you want first-class CI output (NDJSON, exit codes) and recordings without building the plumbing. If "stop writing automation programs, start running plain-English objectives" describes your goal, this is the shape that matches. You can see real end-to-end flows in the case study.

The honest line: if you want a library, BrowserBash is not it, pick Midscene, Stagehand, or ZeroStep. If you want a runnable, committable test and clean CI signal, BrowserBash is built for exactly that.

A realistic migration path from an SDK

Say you are on Midscene or ZeroStep today and curious about the CLI-and-Markdown model. You do not have to rip anything out. A low-risk path:

Pick one fragile flow that breaks often, login plus a key action is a good candidate, and write it once as a *_test.md file. Each step is one list item in plain English.
Run it locally with browserbash testmd run, starting on a local Ollama model so there is no cost and nothing leaves your machine. If the flow is long and a small model gets lost, bump to a 70B-class local model or a hosted model just for that test.
Parameterize with {{variables}} and mark the password as a secret so it is masked in logs.
Wire it into CI with --agent and branch on the exit code. Add --record --upload when you want video for a failing run.
Compare maintenance cost over a few sprints. The thing you are measuring is whether a Markdown file your team can read and review is cheaper to keep green than an SDK call inside a test program.

If the answer is no, you have lost an afternoon and kept your existing suite. If the answer is yes, you have a path off the SDK that your non-developers can actually contribute to. Plenty more worked examples live on the BrowserBash blog.

FAQ

What is the best Midscene alternative for teams that do not want an SDK?

If you specifically want to avoid wiring a library into your own code, BrowserBash is the closest fit, because it is a CLI and a committable Markdown test format rather than an in-code SDK. You install one binary, write a plain-English objective or a *_test.md file, and run it. Shortest is also worth a look if you prefer a natural-language test framework adopted into your repo. Midscene, Stagehand, and ZeroStep are all SDKs by design.

Is Midscene free and open source?

Midscene is an open-source project for AI UI automation, so the library itself is free to use. Your real cost is usually the model: if you point it at a hosted vision model, you pay for tokens. BrowserBash takes a different angle by defaulting to free local models through Ollama, which lets you guarantee a $0 model bill, with hosted Anthropic or OpenRouter models available only if you opt in.

Can these AI UI-automation tools run in CI?

Yes, but the ergonomics differ. SDK-style tools like Midscene, Stagehand, and ZeroStep can run in CI, but you build the structured output and exit-code handling yourself. BrowserBash ships an agent mode that emits NDJSON and uses explicit exit codes (0 passed, 1 failed, 2 error, 3 timeout), so a pipeline or an AI coding agent can consume results without parsing prose.

Do I need a powerful machine to run AI UI automation locally?

For short flows, a modest local model is usually fine. For long, multi-step objectives like a full checkout, very small models, around 8B parameters and under, can lose the thread and get flaky. The reliable sweet spot is a mid-size local model in the Qwen3 or Llama 3.3 70B class, or a capable hosted model for the hardest flows. You can mix and match: run cheap flows locally and reserve a hosted model for the tough ones.

If you want runnable, committable tests instead of another SDK in your codebase, BrowserBash is worth ten minutes of your time. Install it with npm install -g browserbash-cli, point it at a real flow, and watch a plain-English objective run against an actual browser. No account is required to run anything; if you later want hosted run history and video replay, you can opt in by creating a free account at browserbash.com/sign-up.