Replace Rainforest QA With an AI Agent Testing CLI

If you are looking for a rainforest qa alternative ai teams can actually self-host, the usual trigger is one of three things: the bill grew faster than your test suite, your test runs depend on a vendor cloud you cannot put sensitive data into, or your pipeline cannot branch cleanly on a pass/fail signal without scraping a dashboard. Rainforest QA earned its place by making functional tests readable to non-engineers and, historically, by pairing that with on-demand human testers. This article is about moving those same plain-English tests onto an AI agent you run yourself — keeping the readability your team likes while changing where the work happens, who pays for it, and how your CI reads the result.

I am not here to dunk on Rainforest QA. It solved a real problem: most teams cannot staff enough manual QA to cover every release, and writing Selenium for every flow is expensive and brittle. Rainforest's answer — readable tests plus, in its earlier model, a crowd of human testers, and later more AI-assisted no-code execution — is a legitimate one. The question this guide answers is narrower. If the crowd model is too slow or too costly for your release cadence, or if you need everything to run on infrastructure you control, what do you move to, and what do you trade away? The honest answer is that you give up a few things and gain a few others, so let's be specific about both.

What Rainforest QA actually does (and where it pinches)

Rainforest QA is a commercial, cloud-hosted functional and end-to-end testing platform. Its core pitch over the years has centered on no-code test authoring that non-engineers can read and maintain, with execution handled in Rainforest's environment. In its earlier and best-known form, it leaned heavily on a crowd of human testers who would execute test steps and report results; more recent positioning has emphasized AI-assisted, no-code automated runs. Exact current pricing, the precise mix of human-versus-automated execution, and internal model details are not something I will invent here — those specifics are commercial and shift over time, so treat anything you read (including this) as "verify on their site, as of 2026."

What is stable enough to plan around is the shape of the product, and the shape is where the friction shows up for some teams:

It is a hosted SaaS. Tests run in Rainforest's cloud against your application. For a public marketing site that is fine. For an internal admin tool behind a VPN, a pre-release build, or an app handling regulated data, "the test runs in a vendor's environment" is a constraint you have to design around.
Pricing is commercial and seat- or plan-shaped. The historical crowd model in particular had a per-execution cost character — human time is not free — which means cost scaled with how often and how broadly you tested. That is the opposite of what you want when you are trying to test more.
The CI contract is mediated by the platform. You trigger runs and read results through Rainforest's system rather than getting a raw exit code from a binary in your own pipeline step.

None of those are defects. They are design choices that fit some teams and pinch others. If they pinch you, the rest of this article is the migration.

The shape of the swap: from crowd/no-code runs to AI agent objectives

The thing to hold onto from Rainforest is the part your team genuinely values: tests written in plain English that a product manager or a junior tester can read without learning a selector syntax. The thing to drop is the dependency on a vendor cloud and a human or hosted-AI crowd to execute them.

BrowserBash is a free, open-source (Apache-2.0) command-line tool that keeps the first part and replaces the second. You write an objective in plain English. An AI agent reads it, drives a real Chrome or Chromium browser step by step — no selectors, no page objects, no recorded scripts — and returns a verdict plus structured results. Instead of "a tester (or a hosted agent) somewhere runs my steps," the model is "an AI agent on my machine, or in my CI runner, runs my objective and tells me what happened."

Here is the same checkout flow you might have had in Rainforest, expressed as a BrowserBash objective:

browserbash run "Go to the store, log in with the test account, add the blue running shoes to the cart, complete checkout with the saved card, and verify the page shows 'Thank you for your order!'"

No element IDs. No click(#checkout-btn). The agent figures out the steps the way a human reading that sentence would, drives the browser, and reports whether it reached the goal. That is the same readability Rainforest sells, with the execution moved under your control.

Where the browser runs is now your choice

Rainforest decides where your tests execute. BrowserBash makes that a flag. The --provider option chooses where the browser actually runs:

local (default) — your own Chrome on your machine or CI runner.
cdp — any Chrome DevTools Protocol endpoint you point it at.
browserbase, lambdatest, browserstack — managed cloud browser grids, when you want scale or cross-environment coverage.

So you can keep everything on a self-hosted runner for a sensitive internal app, then flip one flag to fan out across a cloud grid for broad browser coverage on your public site, all from the same objective.

The model story: a genuine $0 path, and an honest caveat

This is the part that changes the economics, so I want to be precise and not oversell it.

BrowserBash is Ollama-first. By default it uses free local models through Ollama — no API keys, nothing leaves your machine. It auto-resolves a provider in this order: local Ollama, then ANTHROPIC_API_KEY, then OPENROUTER_API_KEY. If you have Ollama running, you can guarantee a literal $0 model bill, because inference happens on your own hardware and your application's pages never get shipped to a third-party LLM. For teams that moved off Rainforest specifically because they did not want page content leaving their environment, this is the headline.

You are not locked into local, though. BrowserBash supports OpenRouter — including genuinely free hosted models such as openai/gpt-oss-120b:free — and Anthropic's Claude models if you bring your own key. The point is that you own the model decision and the bill, rather than inheriting a vendor's bundled inference cost.

Now the honest caveat, because credibility matters more than a clean pitch: very small local models (around 8B parameters and under) can be flaky on long, multi-step objectives. A ten-step checkout-and-refund flow can drift or lose the thread on a tiny model. The sweet spot is a mid-size local model in the Qwen3 or Llama 3.3 70B class, or a capable hosted model for the genuinely hard flows. If you try BrowserBash with a 3B model on a complex journey and it stumbles, that is expected — size up the model, do not write off the approach. For short, well-scoped objectives (log in, check a banner, submit a form), even smaller models do fine.

Rainforest QA vs. BrowserBash: an honest comparison

Here is the head-to-head. I have hedged anywhere the competitor's specifics are commercial or not publicly fixed.

Dimension	Rainforest QA	BrowserBash
Authoring model	No-code, plain-English test steps	Plain-English objectives driven by an AI agent
Who executes	Rainforest cloud (human crowd historically; AI-assisted no-code more recently)	An AI agent you run locally or in your CI
Where it runs	Vendor cloud	Local Chrome by default; `cdp`, Browserbase, LambdaTest, BrowserStack via `--provider`
Pricing	Commercial, plan/seat-shaped (verify current, as of 2026)	Free, open source (Apache-2.0)
Model / inference cost	Bundled into the platform; not separately specified	Your choice; $0 on local Ollama, or BYO hosted key
Data residency	Pages processed in vendor environment	Stays on your machine on local models
CI contract	Mediated via platform runs and results	Native exit codes + NDJSON via `--agent`
Test-as-code	Platform-managed test definitions	Committable `*_test.md` files in your repo
Best fit	Teams wanting a managed, hands-off no-code QA service	Teams wanting self-hosted, scriptable, cost-controlled AI testing

The table is not a knockout. Rainforest's managed, hands-off character is a real advantage if you do not want to own infrastructure or a model. The trade you are evaluating is "managed convenience and a crowd/hosted execution layer" against "self-hosted control, $0-capable inference, and a raw CI signal." Pick the constraint that hurts more.

The CI win: real exit codes and NDJSON instead of dashboard scraping

If you have ever wired a hosted test platform into a pipeline, you know the pain: trigger a run, poll an API, parse a result payload, and hope the schema does not change. BrowserBash flips that. It is a binary in your pipeline step, and it speaks the language CI already understands.

Run any objective with --agent and you get NDJSON — one JSON event per line — on stdout, plus stable exit codes:

0 — passed
1 — failed
2 — error
3 — timeout

That means your pipeline can branch on $? with no prose parsing and no dashboard scraping:

browserbash run "Log in as the standard user and confirm the dashboard loads with today's date" \
  --agent --headless --provider local

if [ $? -eq 0 ]; then
  echo "Smoke test passed"
else
  echo "Smoke test failed — blocking deploy"
  exit 1
fi

For an AI coding agent or an orchestration script, the NDJSON stream is even more useful than the exit code: every step the agent takes is a structured event you can consume programmatically. There is no "read the dashboard, screenshot it, summarize it" step. The result is the data. This is the single biggest day-to-day difference once you have migrated — your green/red signal becomes a first-class part of the pipeline instead of a thing you integrate around.

Keeping tests readable and version-controlled: Markdown tests

The strongest argument for Rainforest with non-engineers is readability: a PM can open a test and understand it. BrowserBash keeps that, and adds something Rainforest's platform-managed definitions do not give you as directly — your tests live in your repo, in plain Markdown, under the same code review and version history as everything else.

A BrowserBash Markdown test is a *_test.md file where each list item is a step. It supports @import for composing shared flows (a login block you reuse everywhere) and {{variables}} for templating. Variables marked as secret are masked as ***** in every log line, so credentials never leak into CI output. After each run it writes a human-readable Result.md you can hand to anyone.

# Checkout smoke test

@import ./flows/login_test.md

- Add the {{product_name}} to the cart
- Go to checkout
- Pay with the saved card ending {{card_last4}}
- Verify the page shows "Thank you for your order!"

Run it:

browserbash testmd run ./checkout_test.md \
  --var product_name="Blue Running Shoes" \
  --var card_last4="4242" \
  --secret password="$STORE_PASSWORD"

The password here is secret-marked, so it shows up as ***** in logs. This is the part that makes a migration durable: a PM reads the Markdown and understands the flow, a developer reviews it in a pull request, and CI runs it with real secrets that never appear in plaintext. You get Rainforest's readability and the discipline of test-as-code. For more patterns, the BrowserBash learn pages walk through composition and templating in depth.

A practical migration plan

You do not rip out a working QA process in an afternoon. Here is the sequence I would run, and have seen work, when moving readable tests off a hosted platform.

1. Inventory and triage your existing tests

Pull your current Rainforest test list and sort it into three buckets: high-value smoke tests that gate every deploy, broad regression flows, and rarely-run edge cases. Migrate the smoke tests first — they are short, they run constantly, and they are where a fast, free, self-hosted runner pays off immediately.

2. Rewrite as objectives, not step-by-step scripts

Resist the urge to transliterate every recorded click. The whole point of an AI agent is that you describe the goal, not the keystrokes. A 14-step Rainforest script often collapses into a two-sentence BrowserBash objective. Shorter objectives are also more robust to UI changes, because the agent re-reads the page each run instead of replaying frozen coordinates.

3. Pick your model tier per flow

Map flows to models. Short smoke checks run fine on a small or mid-size local model at $0. Long, branchy journeys — multi-step checkout with refunds, multi-role approval workflows — deserve a mid-size local model (Qwen3 / Llama 3.3 70B-class) or a capable hosted model. Do not put your hardest flow on your smallest model and conclude the tool is unreliable; that is a model-sizing problem, not an approach problem.

4. Wire exit codes into CI

Replace your platform-trigger-and-poll integration with a plain pipeline step that runs browserbash ... --agent --headless and branches on the exit code. If you are on Jenkins, GitHub Actions, GitLab CI, or anything else that reads $?, this is a few lines. The BrowserBash blog has CI-specific walkthroughs if you want a copy-paste starting point for your runner.

5. Add recording where you need evidence

When a test fails and someone asks "what actually happened," you want artifacts. The --record flag captures a screenshot and a full .webm session video (via ffmpeg) on any engine. On the builtin engine you additionally get a Playwright trace you can open in the trace viewer and step through.

browserbash run "Sign up a new user, verify the welcome email banner, and log out" \
  --record --headless

That .webm is the thing you attach to a bug report. It replaces the "watch the crowd tester's notes" evidence trail with a video you own.

6. Decide on a dashboard, if any

You may not want one — many teams are happy with Result.md and CI logs. If you do, BrowserBash gives you two opt-in, free options. A fully local dashboard runs with browserbash dashboard. Or, strictly opt-in, you can use the free cloud dashboard (run history, video recordings, per-run replay) by running browserbash connect and adding --upload to your runs. Free uploaded runs are kept for 15 days. Nothing uploads unless you ask it to — there is no account required just to run tests.

Engines: stagehand and builtin

One more knob worth knowing during migration. BrowserBash ships two engines. The default is stagehand (MIT-licensed, by Browserbase), which handles the plain-English-to-browser-action translation well across a wide range of sites. The alternative is builtin, an in-repo Anthropic tool-use loop, which is the one that gives you the Playwright trace on --record. For most flows you will never touch this; if a particular site is fighting one engine, switching is a flag away. The default is a sensible starting point for everything you migrate.

When to choose Rainforest QA over BrowserBash

I promised an honest read, so here it is — the cases where you should not migrate, or should think hard first.

You genuinely want a fully managed service. If you do not want to own a CI runner, a model, or any infrastructure, and you would rather pay a vendor to make QA Somebody Else's Problem, a hosted platform is the right shape. BrowserBash assumes you are comfortable running a command-line tool in your own environment.
You need humans in the loop by design. If part of your value from Rainforest is real human judgment — exploratory testing, "does this feel right," subjective UX checks — an AI agent driving a browser does not replace a human's taste. It executes objectives; it does not have opinions about your design.
You have no engineering bandwidth at all. Migrating means writing objectives, wiring CI, and picking a model. It is not heavy, but it is non-zero. A no-code platform that an ops person clicks through may fit a team with zero developers better.
You want a single vendor SLA and support contract. Open source gives you the code and a community; it does not give you a phone number with a guaranteed response time. If procurement requires a vendor SLA, weigh that.

If two or more of those describe you, stay on the managed platform — or run both, using BrowserBash for fast self-hosted smoke checks and the hosted service for the broader managed coverage.

When BrowserBash is the better fit

And the other side, because this is the audience that benefits most:

Cost is scaling with your testing, not your value. If testing more costs more in a way that discourages coverage, a free, self-hosted, $0-on-local tool removes that disincentive entirely.
Data cannot leave your environment. Internal tools, pre-release builds, regulated data — on local models, pages never reach a third party. That is a hard requirement you can actually meet.
You live in CI and want a raw signal. Exit codes and NDJSON beat polling a dashboard every time you need a deploy gate.
You want tests in your repo. Committable *_test.md files under code review and version control, with secret masking, are a better long-term home for tests than a platform-managed store.
An AI coding agent is part of your workflow. The --agent NDJSON stream was built for exactly this — machines consuming machine output, no prose parsing.

You can see real end-to-end flows on the BrowserBash case study page, and current cost details on pricing (the CLI itself is free).

FAQ

Is there a free Rainforest QA alternative that uses AI?

Yes. BrowserBash is a free, open-source (Apache-2.0) CLI that uses an AI agent to drive a real browser from plain-English objectives. It is Ollama-first, so on local models you can run it with no API keys and a literal $0 inference bill, while keeping your pages on your own machine. You install it with npm install -g browserbash-cli and no account is required to run tests.

Can I keep my plain-English tests when I migrate off Rainforest QA?

Yes, and that is the main thing you keep. BrowserBash tests are plain-English objectives a non-engineer can read, and you can store them as committable Markdown *_test.md files where each list item is a step. You usually rewrite long recorded scripts as shorter goal-based objectives, which are also more resilient to UI changes than frozen step lists.

How does BrowserBash fit into a CI pipeline compared to a hosted platform?

BrowserBash runs as a binary in your pipeline step and returns standard exit codes — 0 passed, 1 failed, 2 error, 3 timeout — plus NDJSON output when you pass --agent. That lets CI branch on the result directly instead of triggering a run and polling a vendor API. There is no dashboard scraping and no prose parsing to integrate.

Are small local models reliable enough for complex test flows?

For short, well-scoped objectives, yes. For long multi-step journeys, very small models (around 8B and under) can be flaky and lose the thread. The reliable path for hard flows is a mid-size local model in the Qwen3 or Llama 3.3 70B class, or a capable hosted model — size the model to the difficulty of the flow rather than expecting a tiny model to handle everything.

Ready to try it? Install with npm install -g browserbash-cli, point an objective at your app, and watch an AI agent drive a real browser and hand you a verdict. When you want run history and video replay, the optional free dashboard is one sign-up away — but an account is entirely optional, and the CLI runs without one.