AI browser automation for RPA teams

If you run an RPA practice, the AI automation for RPA teams story arriving in your inbox this year is mostly noise: vendors telling you to rip out your bots and replace them with agents that "just figure it out." You already know better. Your bots run millions of stable transactions a quarter, and you are not handing reconciliation to a language model that improvises. The useful question is narrower: where in your portfolio does an English-language web objective actually beat a recorded selector, and how do you add that without betting the farm? This article answers that, with the web slice of your processes as the focus and honesty about where agents do not belong.

The short version: the most fragile part of most RPA estates is the part that drives web applications through brittle selectors and coordinates. That is exactly the surface where an AI agent that reads a plain-English goal and acts against the live DOM earns its keep. BrowserBash is a free, open-source CLI built for that slice. It is browser-scoped on purpose. It will not click around your desktop ERP client or a Citrix window, and this article tells you plainly when your existing RPA tool is the right answer instead. What it does is turn the web steps that keep breaking into objectives that bend.

Where RPA actually breaks for most teams

RPA earned its budget by being deterministic and cheap per run. A recorded bot opens an app, clicks a known element, types a value, presses Enter, and moves to the next row in milliseconds. For high-volume, stable, repetitive work against interfaces that do not change, nothing about that is broken and an LLM is the wrong tool. Keep those bots.

The pain is concentrated and predictable. Industry research consistently puts RPA maintenance at roughly 30 to 50 percent of the automation budget every year, with some analysts citing higher, and a frequently quoted figure is that 30 to 50 percent of enterprise RPA projects are abandoned within two years. That maintenance tax is not spread evenly across your estate. It clusters on a few categories: web portals you do not control, vendor SaaS that ships a redesign on its own schedule, login flows with shifting consent dialogs, and any screen reached through a browser where a layout change moves an element three rows down.

The mechanism is the architecture. Classic RPA against the web is selector-based or coordinate-based. The bot encodes one path through one version of one interface, with no concept of intent. When a supplier renames a field in an ERP refresh or a SaaS vendor reflows a page, the bot does not adapt. It fails, often silently, and a developer spends an afternoon re-recording. Multiply that by every external web target in your portfolio and you have located most of your maintenance backlog. This is the slice worth rethinking; the rest of your estate can stay as it is.

The web slice is the right wedge

Start with web steps specifically rather than your whole process catalog. Web targets change more often than your internal desktop apps, because you do not own the release schedule. They are also the steps where an LLM driving structured page elements has the cleanest advantage, because the browser already holds the page as a Document Object Model the agent can reason about. You do not migrate anything. You carve out the handful of brittle web actions inside an otherwise healthy process and let an agent own just those, while your RPA platform keeps orchestrating, queueing, and running everything else.

What an English web objective replaces

Picture a refund-handling process. Today, an RPA developer recorded the web portion: navigate to the admin console, wait for the search box at a known selector, type the order ID, click a button at a fixed position, select a reason from a dropdown by index, confirm, and assert that a success element with a known class appears. Every one of those steps is a hostage to the portal's markup. The day the vendor renames a CSS class or reorders the dropdown, the recording breaks.

The AI-agent version is one sentence of intent:

browserbash run "Log in to the refund console, search order ORD-1029, click Issue Refund, choose 'Full refund', select reason 'Damaged item', confirm, and verify a green success banner reading 'Refund issued' appears." --record

There are no selectors in that command. A language model reads the objective, looks at the live page, decides which element is the search box and which control issues the refund, acts step by step, and returns a verdict plus any structured values it extracted. When the portal reflows, the agent re-reads the page and usually still finds the search box and the confirm control, because it reasoned about what they are for rather than where they sat. The --record flag captures a screenshot and a .webm session video, so when something looks off three weeks from now you have a replay of exactly what the agent saw.

That is the whole pitch for the web slice: you trade a brittle recording for a durable objective. You are not asking the model to invent your business logic. You are asking it to absorb the layout drift that snaps recorded selectors — the part RPA was never good at.

How BrowserBash fits a hybrid RPA stack

The architecture the whole industry converged on in 2026 is hybrid: an LLM handles the cognitive, ambiguous decisions, and deterministic tools handle execution. Your RPA platform is not going away. It becomes the orchestrator and reliable executor for the steps that should stay deterministic, while agents own the drift-prone parts. BrowserBash slots in as the browser executor for the web slice.

Concretely, there are three places it fits inside an existing process.

As a called step from your orchestrator. Your RPA tool or scheduler runs the process and, when it reaches a flaky web action, shells out to BrowserBash instead of a recorded web recording. The agent handles that one objective and returns a result the process branches on.
As the web hands for an upstream agent. If you are building an agentic layer that reads an email, classifies a case, and decides what to do, give it a deterministic web tool to call rather than letting a general computer-use model guess pixel coordinates on a web page. The reasoning stays in your agent; the web execution is a known, loggable command.
As a standalone replacement for a web-only mini-process. Some "processes" are entirely a few web clicks and a data read. Those can move out of the RPA platform wholesale and run as a committed Markdown test on a schedule.

The contract that makes this work in a stack is agent mode. Add --agent and BrowserBash emits NDJSON — one JSON event per line — with step events and a terminal event carrying the final state, plus clean exit codes: 0 passed, 1 failed, 2 error, 3 timeout. Your orchestrator reads the exit code and the structured output instead of scraping a human-readable log. That is the integration surface RPA platforms understand.

browserbash run "Open the supplier portal, log in with {{USERNAME}} and {{PASSWORD}}, \
open the Invoices tab, find the most recent invoice marked Overdue, \
and return its invoice number and amount as JSON" --agent

The {{USERNAME}} and {{PASSWORD}} placeholders are variables; secrets are masked in output so credentials do not leak into logs. That matters when the run record lands in an audit trail.

The honest line: browser-scoped, not computer-use

This is the part most vendor pitches skip, and the part an RPA team most needs to hear. BrowserBash automates web browsers. It is not a general computer-use system and does not drive your operating system. If your process clicks around a native desktop ERP client, a Citrix or remote-desktop session, a legacy Windows application with no web interface, a PDF viewer, or anything outside a browser, BrowserBash is the wrong tool and your existing RPA platform — or a general computer-use model that synthesizes mouse and keyboard at the OS level — is the right one. No browser-scoped tool can match that generality, and pretending otherwise would waste your time.

The trade is deliberate. By staying inside the browser and acting against the DOM rather than a flat screenshot, a browser-scoped agent is cheaper, faster, more deterministic, and far friendlier to CI. Reasoning over structured page elements takes fewer model calls than re-screenshotting after every action, and a twelve-pixel layout shift does not silently break a DOM-based step the way it breaks a coordinate-based one. General computer-use agents buy universal reach by perceiving raw pixels and guessing coordinates, which costs more, runs slower, and is harder to make repeatable.

So the rule for an RPA team is clean. Sort each step by where it lives. If it lives in a browser, a browser-scoped agent is the better-fit, cheaper tool. If it lives anywhere else on the desktop, keep it on your RPA platform or a computer-use model. Most estates have plenty of both, and the win is matching each step to the tool that fits it.

A comparison RPA teams can act on

Here is the side-by-side that matters when you are deciding what to move and what to leave. None of this is a benchmark; it is a structural comparison of where each approach fits.

Dimension	Recorded RPA web bot	General computer-use agent	BrowserBash (browser-scoped agent)
Targets	Web + desktop, via selectors/coordinates	Anything a human sees on screen	Web pages only
Survives UI drift	No, breaks on layout/selector change	Often, but via pixel guessing	Often, reasons over the live DOM
Perception model	Selectors / fixed coordinates	Screenshots (pixels)	DOM (structured elements)
Relative cost per run	Very low once built	Highest (vision + many model calls)	Lower than computer-use, free with local models
Determinism	High, until it breaks	Lowest	Higher than pixel agents, DOM-based
Build/maintenance effort	High maintenance on changing UIs	Low build, ongoing eval cost	Low build, objectives instead of selectors
CI-friendliness	Varies, often GUI-bound	Poor (slow, heavy)	Strong (`--agent`, exit codes, headless)
Right home	Stable internal apps, desktop steps	True OS-level / native-app tasks	Web steps inside a larger process

Read the bottom row as the decision. Stable, high-volume, internal or desktop work stays on RPA, where determinism and cost-per-run win. True OS-level tasks go to a computer-use model. The drift-prone web steps — the ones generating your maintenance tickets — are where a browser-scoped agent is the better fit.

What the research actually says about the trade

A 2026 comparative study, Are LLM Agents the New RPA?, pitted agentic computer-use automation against traditional RPA across standard enterprise tasks like data entry, monitoring, and document extraction. The result was not "agents win." RPA outperformed the agentic approach on execution speed and reliability in repetitive, stable environments, while the agentic approach significantly cut development time and adapted better to dynamic interfaces. The authors also stated plainly that current agentic computer-use implementations are not yet production-ready for the hardest cases.

That is the right frame for an RPA team. The agent's edge is build speed and adaptability against change; the bot's edge is raw speed and consistency on stable surfaces. You are trading determinism for flexibility, and the correct call depends on which a given step needs more. For your stable internal volume, that trade is a bad one and you keep the bot. For the web portal that breaks every quarter, the trade is exactly right.

Respect one determinism caveat even inside the web slice. LLM output is probabilistic; the same objective can take a slightly different path on a different run. So you keep the model's free rein small. Let the agent absorb layout drift and find elements, then assert on a concrete, checkable outcome — a specific success banner, an exact status, a returned value you validate — so a run either meets a defined condition or it does not. Vague objectives are where flakiness lives. Tight, assertable objectives are where this approach is production-grade.

Make web steps versionable, auditable, and CI-ready

RPA teams live or die by governance, so the operational story matters as much as the capability. BrowserBash treats web checks as artifacts you can commit and review, not click-recorded recipes buried in a vendor console.

You can write a repeatable web action as a Markdown test file — a *_test.md file with {{variables}} for parameterization and masked secrets — and run it like any other build step. That puts the web slice under the same version control, code review, and change history as the rest of your engineering work, which is a governance upgrade over a recording nobody can diff.

browserbash testmd run ./reconcile_supplier_portal_test.md --agent --headless

A few properties make this fit an RPA team's audit expectations.

Steps are versionable. The Markdown test lives in your repository. Changes are reviewed, and you can see who changed an objective and when.
Runs are recordable. --record produces a screenshot and a .webm session video (and, on the builtin engine, a Playwright trace), so when a step fails you watch exactly what the agent saw instead of guessing. That replay doubles as living documentation for a portal nobody on the team fully remembers.
It speaks machine. --agent emits NDJSON and returns exit codes, so a scheduler, CI job, or upstream orchestrator branches on the result programmatically.
History stays local, dashboard optional. Run history is local by default, with an optional cloud dashboard if you want a shared view. Nothing forces your run data off your machine.

For an RPA practice that already enforces change control, this is the missing piece that makes an agent-driven web step acceptable to governance rather than a black box.

Cost and data control: local models keep the bill at zero

One reason general computer-use agents scare finance is the per-run cost of vision models doing many calls per task. BrowserBash takes the opposite default. Its model resolution is Ollama-first: the default auto order tries a local Ollama model first, then ANTHROPIC_API_KEY, then OPENAI_API_KEY. Run a capable local model and your inference bill is zero and nothing leaves your machine — which is the posture you want for a portal holding supplier or customer data. OpenRouter and Anthropic are available when you want a hosted model.

Be honest about the local-model caveat, because it determines whether this works in practice. Tiny local models in the eight-billion-parameter range and below get flaky on long, multi-step web objectives. The reliable sweet spot is a Qwen3 or Llama 3.3 70B-class model, or a hosted model when the task is long or high-stakes. Keep objectives short and assertable, run them on a model that can actually follow multi-step instructions, and the local-first path holds up. Ask a small model to drive a fifteen-step flow and it will disappoint you. That is a real boundary, not a footnote.

browserbash run "Log in with the service account, open the orders page, and return the last 10 order IDs and totals as JSON" --model ollama/qwen3

For a team weighing a six-figure RPA renewal against augmenting the brittle web slice, this is what AI automation for RPA teams looks like in budget terms: a free, open-source tool that runs on local models and integrates through exit codes is an easy experiment to justify.

A migration playbook for the web slice

You do not rebuild anything. You augment. A practical sequence for an existing RPA team looks like this.

Inventory your maintenance tickets. Pull the last two quarters of RPA breakages and tag the ones rooted in a web UI change. That candidate list is almost certainly dominated by external portals and vendor SaaS.
Pick one high-frequency web breaker. Choose a step that fails often and is purely web — a login, a portal search, a status read, a form submit. Low stakes, high annoyance.
Rewrite it as an objective. Express the step in plain English with a concrete assertion at the end. Validate it locally on a 70B-class or hosted model with --record so you can watch the agent work.
Make it an artifact. Move the objective into a committed *_test.md with {{variables}} and masked secrets, so it is versioned and reviewable like everything else.
Wire it into the process. Have your orchestrator or scheduler call browserbash testmd run ... --agent --headless and branch on the exit code. The deterministic backstop is the assertion; the agent only owns finding and acting.
Measure, then widen. Track whether that step still breaks after the next vendor redesign. When it survives a change that used to generate a ticket, move the next web breaker over.

Starting small is not timidity; it makes the win legible. One web step that used to break every quarter and now does not is a clean before-and-after your leadership understands, and it sizes the rest of the opportunity for you.

For deeper, copy-pasteable examples, the tutorials and the learn pages walk through login flows, data extraction, and CI wiring end to end, and the blog covers adjacent use cases like internal-tool checks and web scraping.

Who this is for, and who should skip it

Choose a browser-scoped agent for the web slice of your RPA estate when:

The step drives a web application you do not control, and it breaks on vendor UI changes.
You want web checks under version control, in CI, with exit codes and recordings.
Data sensitivity or cost pushes you toward local models and a zero-egress posture.
You can keep objectives short and end them on a concrete assertion.

Stay on your RPA platform, or reach for a general computer-use model, when:

The work is stable, high-volume, internal, and already deterministic — do not fix what is not broken.
The step lives outside a browser: a native desktop client, Citrix, a remote-desktop window, a legacy app with no web UI.
You need OS-level reach that only a computer-use model or RPA tool provides.
The objective is long, branchy, and high-stakes, and you have no hosted model or no human checkpoint to back it.

That split is the whole point. AI automation for RPA teams is not a replacement event. It is matching each step to the tool that fits it, and giving the brittle web steps a tool that bends instead of breaking. Your bots keep the stable volume. Your agents take the drift. The web slice stops generating maintenance tickets.

FAQ

Does AI browser automation replace RPA for my team?

No, and you should be skeptical of anyone who says it does. The 2026 reality is hybrid: your RPA platform keeps running stable, high-volume, deterministic work, and AI agents take over the ambiguous, drift-prone steps. For RPA teams, the highest-value place to start is the web slice — the portal and SaaS steps that break on UI changes — while everything stable stays exactly where it is.

Can BrowserBash automate desktop applications and Citrix sessions?

No. BrowserBash is browser-scoped, meaning it automates web browsers by reasoning over the page DOM. It does not drive native desktop apps, Citrix or remote-desktop windows, or anything outside a browser. For those steps, your existing RPA platform or a general computer-use model is the correct tool, and BrowserBash is meant to complement them on the web steps, not replace them.

How is an AI agent more reliable than a recorded RPA bot for web steps?

A recorded bot targets fixed selectors or coordinates, so it breaks when a vendor reflows a page or renames an element. A browser-scoped agent reads a plain-English objective and reasons about what each control is for against the live DOM, so it usually still finds the right element after a layout change. The tradeoff is that LLM output is probabilistic, so you keep objectives short and end each one on a concrete assertion you can check.

Does using AI browser automation send my data to the cloud?

Not necessarily. BrowserBash resolves models Ollama-first, so if you run a capable local model your inference bill is zero and nothing leaves your machine, which suits portals holding sensitive supplier or customer data. Hosted options through Anthropic, OpenAI, and OpenRouter exist when you want them, and run history stays local by default with an optional cloud dashboard you can choose to enable.

Ready to take the brittle web steps off your maintenance backlog? Install with npm install -g browserbash-cli and rewrite your worst web breaker as an objective tonight. An account is optional — sign up only if you want the shared cloud dashboard.