RPA for web apps, done with AI

Name: BrowserBash
Author: The Testing Academy

If you have ever owned a fleet of web bots, you know the dread. RPA for web apps promised hands-off automation, and for the structured parts it delivered. Then a vendor shipped a redesign, a div lost a class name, a login page added a step, and three workflows died overnight. Someone paged you. You spent the afternoon re-recording selectors instead of building anything. The bot did exactly what it was told. The problem is what it was told: click the element at this exact path, every time, forever, as if the web never changes.

This article is about a different way to automate web applications. Instead of recording brittle click paths, you write an English objective, and an AI agent drives a real Chrome browser to satisfy it, step by step, with no selectors to maintain. You will see where this approach genuinely beats traditional RPA, where traditional RPA and OS-level computer-use agents still win, and how to run a working example today with BrowserBash, a free, open-source CLI. I will be honest about the trade-offs, because the worst thing you can do with automation is adopt the wrong tool for a problem it was never built to solve.

Why RPA for web apps gets brittle

Traditional RPA was designed for a world of stable, internal applications. The classic recorder watches you operate a screen, then replays your actions by targeting fixed coordinates, image matches, or hardcoded UI paths. That works beautifully when the target never moves. Web apps move constantly.

The fragility is not a bug in any one vendor's product. It is structural. A bot that depends on #root > div:nth-child(2) > form > button is betting that the page's DOM shape stays frozen. Modern front ends break that bet on a normal release cadence. Component libraries regenerate class names. A/B tests swap layouts for half your users. A framework upgrade re-nests the tree. Each change is invisible to a human, who still sees a blue "Submit" button and clicks it, but fatal to a selector-bound bot.

The cost shows up in maintenance. Industry write-ups in 2026 commonly cite annual RPA maintenance running around 20 percent of license value, with some teams reporting that a large share of their bots need regular attention just to keep running as target UIs drift. Add the people. Mid-to-large RPA programs typically staff a center of excellence with dedicated engineers for governance, monitoring, and bot repair. None of that work ships features. It just keeps the lights on against a tide of UI change you do not control, especially for third-party portals where you have zero say over the next redesign.

There is a second tax: licensing. Published list prices for the big platforms put unattended robots in the rough range of several thousand dollars per robot per year, and AI add-ons layer consumption charges on top. The exact numbers vary by contract and are not always public, so treat any single figure as indicative rather than gospel. The pattern, though, is consistent. You pay per bot, and you pay again to keep each bot alive.

The shift from recorded clicks to English objectives

The industry response in 2026 has a name: agentic automation. The idea is to put a reasoning model in the loop so the automation understands intent instead of memorizing keystrokes. When the model knows the goal is "log in and download last month's invoice," it can adapt when the invoice button moves, because it is looking for the invoice button, not for pixel coordinates.

This is a real shift, and the incumbents are chasing it. Microsoft has added generative AI to Power Automate so desktop flows can self-heal when an interface changes. UiPath has introduced selector technology that targets elements through application structure and, more recently, a runtime that turns natural-language instructions into multi-step UI automations that adapt on the fly. Independent vendors are building AI-native platforms where a vision-language model is the execution engine rather than a bolt-on. The common thread: describe the outcome, let the model figure out the path.

For web work specifically, this is where a focused tool shines. You do not need an agent that can also drive a 2009 desktop accounting app to automate a web portal. You need one that reads the page the browser already understands and acts on it reliably. That narrower scope is exactly what makes it cheaper, faster, and more deterministic, which is the entire argument for a browser-scoped approach.

How BrowserBash drives a browser from a plain-English goal

BrowserBash is a free, open-source command-line tool from The Testing Academy that automates web browsers from natural language. You install it once, give it an objective, and an AI agent opens a real Chrome or Chromium browser and works toward that objective step by step. There are no selectors in your script. There is no recorder session to redo when the page changes.

Here is the smallest possible example. You describe what you want, and BrowserBash returns a verdict plus any structured values it was asked to capture.

npm install -g browserbash-cli

browserbash run "Go to the staging billing portal, log in with the \
  test account, open the most recent invoice, and report the invoice \
  number and total amount due"

Two design choices make this practical rather than a demo toy.

First, it is DOM-based, not screenshot-pixel based. The agent reads the structured document the browser keeps in memory rather than guessing coordinates from an image. That matters for the brittleness problem at the heart of RPA. When a button shifts 12 pixels or gets a new class name but keeps the same accessible label and role, a DOM-aware agent still finds it. A pixel-matching bot would miss. It also means runs are faster and far cheaper than vision-based control, because the agent is not paying to re-encode a full screenshot on every turn of the loop.

Second, it returns structure, not just a screenshot you have to eyeball. The objective above asks for an invoice number and a total. BrowserBash gives you a pass or fail verdict and those values as data you can assert on or pipe into another system. That is the difference between a recording you watch and an automation you can trust in a pipeline.

Model choices: run it locally for a $0 bill

A fair worry about AI automation is the API meter. Every step the agent reasons about could be a billed token, and a long workflow has many steps. BrowserBash is built Ollama-first to defuse exactly this.

The default model strategy is auto. It prefers a local Ollama model first, then falls back to ANTHROPIC_API_KEY, then OPENAI_API_KEY if those are set. When you run against a local model, the bill is zero and nothing leaves your machine. For RPA-style work over internal portals and sensitive customer data, that local-first, data-stays-home posture is often the deciding factor on its own. You can also point it at OpenRouter or Anthropic when you want a hosted model.

Be honest about a real limitation here. Tiny local models, roughly 8 billion parameters and under, get flaky on long, multi-step objectives. They lose the thread, repeat a step, or declare victory early. The sweet spot for reliable runs is a Qwen3 or Llama 3.3 70B-class model, or a capable hosted model. If you try to run a fifteen-step reconciliation flow on a 3B model and it wobbles, that is the model's limit, not a flaw in the approach. Size the model to the job and the determinism improves sharply. The pricing page lays out the free, open-source footing; the engine itself does not charge you.

Making web RPA reliable in CI

The point of automation is that it runs without you. A run you have to babysit on your laptop is a script, not an automation. BrowserBash is built to live in continuous integration, where most serious web RPA should eventually run.

Agent mode emits newline-delimited JSON, so a pipeline can parse every step the agent took, and it sets meaningful exit codes (0, 1, 2, 3) so your CI job passes or fails on its own without a human reading logs.

browserbash run "Open the partner portal, submit the weekly status \
  report form with this week's numbers, and confirm the success \
  banner appears" --agent

For the workflows you run on a schedule, the Markdown test format keeps them readable and reviewable. You write a *_test.md file in plain English, parameterize the changing bits with {{variables}}, and keep credentials out of the file with masked secrets so they never land in logs or version control.

browserbash testmd run weekly-report_test.md \
  --var week="2026-03-16" \
  --var region="EU"

When something does go wrong at 3 a.m., you want evidence, not a guess. The --record flag captures a .webm video, a screenshot, and a trace of the run, so you can see exactly what the agent saw and did. That replaces the RPA habit of re-running a broken bot by hand just to watch where it falls over.

browserbash run "Reset a test user's password through the admin \
  console and verify the confirmation email arrives in the inbox" \
  --record

For browser provisioning, the --provider flag selects where the browser actually runs: local on your own machine, cdp to attach to a Chrome you control, or hosted grids via browserbase, lambdatest, and browserstack. You keep one English objective and swap the execution surface underneath it. The tutorials walk through each provider end to end if you want to wire one into your stack.

Where traditional RPA and computer-use agents still win

Here is the honest part, and it is the most important section in this article. BrowserBash is browser-scoped. It automates web browsers. It is not a general operating-system controller, and you should not pretend it is.

If your process lives partly outside the browser, BrowserBash is the wrong primary tool for those parts. Real cases where a different category wins:

Native desktop apps. A legacy Windows ERP client, a thick-client accounting tool, a desktop terminal emulator. There is no DOM to read. A general computer-use agent that operates by screenshot and synthetic mouse and keyboard events, or a traditional RPA platform with desktop recorders, is the right fit.
Cross-application workflows that touch the OS. Move a file from a network share, rename it, attach it to a desktop email client, then update a spreadsheet on disk. That is operating-system orchestration, not browser work.
Citrix, RDP, and virtualized sessions. When the "app" is a remote desktop streamed as pixels, there is no structured document to target. Pixel-based automation is the only door in.
High-volume, perfectly stable internal screens. If a workflow truly never changes and runs millions of times, a deterministic, low-cost classic RPA bot on stable selectors can be cheaper per run than any model-driven approach.

A useful mental model for 2026: the market is converging on hybrid setups where classic RPA provides reliable, high-volume execution and AI agents provide adaptive reasoning for the messy parts. BrowserBash is not trying to be the whole stack. It is the sharp tool for the browser slice of it.

Where BrowserBash wins is equally clear. When the task lives in a web browser, a DOM-based agent is cheaper than pixel-based computer use, faster because it skips screenshot encoding on every step, more deterministic because it targets structured elements instead of guessing coordinates, and CI-friendly out of the box. Most modern business workflows, the SaaS dashboards, internal admin panels, partner portals, and vendor web apps, are exactly that: browser tasks. That is the slice where you should reach for it first.

BrowserBash vs traditional web RPA: a side-by-side

The table below compares approaches for the specific case of automating web applications. Treat competitor figures as indicative; exact pricing and capabilities vary by contract and release, and not all of it is public as of 2026.

Dimension	Selector-based web RPA (classic)	OS-level computer-use agents	BrowserBash (browser-scoped AI)
How it targets elements	Hardcoded selectors, coordinates, image match	Screenshot pixels + synthetic input	DOM structure the browser already has
Survives a UI restyle	Often breaks, needs re-record	Usually adapts, may misclick	Usually adapts via element role and text
Authoring	Recorder or visual designer	Plain-English goal	Plain-English objective, no selectors
Scope	Web and desktop, depending on product	Whole OS, any window	Web browsers only (honest limit)
Cost model	Per-bot licensing + maintenance	Per-token, screenshot-heavy	Free, open-source; $0 with local models
Where data goes	Vendor-dependent	Vendor-dependent	Stays on your machine with local models
CI fit	Varies, often add-on	Heavier, slower per step	NDJSON + exit codes built in
Best for	Stable high-volume internal screens	Native apps, Citrix, cross-OS flows	SaaS, portals, admin panels in a browser

The pattern is not that one column wins everything. It is that each column has a home. If your work is overwhelmingly web, the right-most column is built for it. If your work spans native desktop software, the middle column or a classic desktop RPA platform earns its keep.

A realistic migration path off brittle web bots

You do not rip out a working RPA program on a Friday. A sane migration is incremental and starts where the pain is sharpest.

Start with the bots that break most often. Pull your maintenance tickets and find the web workflows that page someone every other sprint. Those are almost always selector-bound bots against UIs you do not control, partner portals, vendor dashboards, third-party SaaS. They are the highest-value, lowest-risk candidates to replace with an English objective, because the brittleness you are escaping is the whole reason they hurt.

Rewrite the objective, not the clicks. Take the bot's purpose, "download the daily settlement file and confirm the row count," and write that as one BrowserBash objective. You are translating intent, not transcribing steps. This is usually far shorter than the recorded version, and there is nothing to re-record next quarter.

Pin it to a model that can do the job. For a short flow, a strong local model is fine and free. For longer, multi-step reconciliations, use a 70B-class local model or a hosted one so the agent stays coherent across all the steps. Run it a dozen times against staging before you trust it. Determinism comes from a capable model plus a clear objective, so invest in both.

Wire it into CI with evidence on. Move the run into your pipeline with --agent for machine-readable output and --record for video and traces while you build confidence. Once it is green for a couple of weeks, you can dial recording back to failures only.

Keep the hybrid boundary explicit. If a workflow dips out of the browser into a desktop app or the file system, do not force BrowserBash to do the OS part. Hand that segment to your existing RPA platform or a computer-use agent and let BrowserBash own the browser segment. A clean handoff beats a leaky one-tool-for-everything fantasy. The case studies and the broader learn library show this kind of browser-scoped flow in practice.

Done this way, your maintenance backlog shrinks because the replaced workflows stop chasing selectors, your bill drops because local models cost nothing to run, and the parts that genuinely belong in classic RPA stay there. That is the realistic win, not a magic rewrite of everything you own.

FAQ

Can AI replace traditional RPA for web applications?

For browser-based workflows, an AI agent that drives a real browser from a plain-English objective can replace many brittle, selector-bound web bots and cut the maintenance they generate. It does not replace RPA for native desktop apps, Citrix or RDP sessions, or file-system orchestration, where a classic RPA platform or an OS-level computer-use agent is the right tool. Most teams in 2026 run a hybrid: AI agents for the adaptive web parts, classic RPA for stable high-volume or desktop work.

Why do RPA bots break so often on websites?

Most web RPA targets elements by hardcoded selectors, coordinates, or image matches, which assume the page never changes its structure. Modern front ends regenerate class names, run A/B layout tests, and re-nest the DOM on normal release cycles, so a change invisible to a human still breaks a selector-bound bot. A DOM-aware AI agent looks for the element by its role and visible text instead, so it usually survives a restyle that would kill a coordinate-based bot.

Is BrowserBash a general computer-use tool like Operator?

No, and that distinction is deliberate. BrowserBash is browser-scoped: it automates web browsers using the DOM rather than controlling the whole operating system through screenshots. General computer-use agents can operate any window and any native app, which is the right choice for desktop software, but they are slower and more expensive per step for web tasks. For work that lives in a browser, the DOM-based approach is cheaper, faster, more deterministic, and CI-friendly.

How much does it cost to automate web apps with BrowserBash?

BrowserBash itself is free and open-source under the Apache-2.0 license, so there is no per-bot license fee. When you run it against a local Ollama model, there is no API bill and your data never leaves your machine. If you choose a hosted model through Anthropic, OpenAI, or OpenRouter instead, you pay that provider's usage rates, but the local-first default means many teams run real workflows at a zero model bill.

Stop re-recording bots every time a button moves. Install the CLI, point it at a web app, and describe the outcome in English.

npm install -g browserbash-cli

Try it locally for free, and create an optional account at https://browserbash.com/sign-up when you want the cloud dashboard.