Agentic RPA, explained

Name: BrowserBash
Author: The Testing Academy

If you have ever watched a rule-based bot fail because someone renamed a button, you already understand the problem agentic RPA is trying to solve. Agentic RPA puts a large language model in charge of a process: instead of replaying a fixed script of clicks, an agent reads a goal, looks at the current state, decides the next move, and adapts when the screen is not exactly what it expected. The promise is automation that bends instead of breaking. The reality, as of 2026, is more nuanced, and this article is about getting that nuance right.

Traditional Robotic Process Automation has run back-office work for over a decade by recording exact sequences against stable interfaces. It is fast, cheap per run, and predictable, right up to the moment a field moves or a vendor ships a redesign. Agentic RPA flips the control model: the LLM becomes the decision-maker, and deterministic tools become the hands it uses to act. Below you will see how that works, what the research actually says about reliability versus flexibility, where the hybrid model is landing in enterprises, and how a browser-scoped tool like BrowserBash fits as the deterministic layer for the web slice of a process. You will also see, honestly, where it does not fit.

What agentic RPA actually means

Strip the marketing and agentic RPA is a control loop with an LLM at the center. The agent receives an objective in plain language ("reconcile yesterday's failed payments and flag any over $500"). It perceives the current state of a system, reasons about what to do next, emits an action, observes the result, and repeats until the objective is met or it gives up. The model is not following a recorded path. It is choosing each step based on what it sees right now.

Compare that to classic RPA. A traditional bot is a deterministic script: open this app, click the element at this position or selector, type this value, press Enter, next row. The logic is hard-coded by a developer who mapped the process once. As long as the interface matches the recording, the bot is flawless and quick. The instant something drifts, a renamed field, a new consent dialog, a layout shift, the bot does the wrong thing or stops cold, because it has no concept of intent. It only knows positions and rules.

The agentic version trades that brittleness for judgment. Because the LLM reasons about goals rather than coordinates, it can absorb small interface changes, handle a branch the original author never scripted, and read unstructured inputs to decide what to do. That adaptability is the entire pitch. It is also the source of every hard problem in the category, because judgment is probabilistic and rules are not.

Agents, tools, and the orchestration layer

A useful way to picture agentic RPA is three layers stacked on top of each other.

The reasoning layer is the LLM. It interprets the goal, plans, classifies, and decides. This is where flexibility lives.
The action layer is a set of deterministic tools the agent can call: an RPA bot that drives a legacy desktop app, an API client, a browser runner that fills a web form, a database query. This is where reliability lives.
The orchestration layer routes work between agents, tools, humans, and queues, and keeps an audit trail of what happened.

The cleanest deployments keep these layers honest. The agent decides; the tools execute; the orchestrator records. When teams blur the lines and let the model both decide and click pixels freely with no deterministic backstop, they get demos that dazzle and pipelines that flake. Keeping the probabilistic part small and the deterministic part large is the single most important design choice in this space.

Why traditional RPA hits a wall

RPA earned its place. For high-volume, stable, repetitive work, a recorded bot is hard to beat: it runs in milliseconds per step, costs almost nothing per execution, and produces the same result every time. Finance, insurance, and operations teams automated millions of hours of data entry and reconciliation this way. None of that goes away because LLMs arrived.

The wall is maintenance and ambiguity. RPA bots are notoriously fragile against change, and the cost is paid in broken automations every time an upstream vendor ships a UI update. Each bot encodes one path through one version of one interface. Multiply that by hundreds of processes and the maintenance backlog quietly eats the savings the bots were supposed to deliver. RPA also cannot handle ambiguity: it cannot read a freeform email and decide which of five workflows applies, because there is no rule for "decide." Anything requiring judgment had to be carved out and handed to a person.

Agentic RPA targets exactly those two weaknesses. An agent that reasons about intent tolerates the layout drift that snaps a recorded bot, and an agent that reads natural language can do the classification step that rules never could. That is real, and it is why every major automation vendor pivoted hard toward agents in 2025 and 2026. The honest follow-up question is what you give up to get it.

What you trade: flexibility for determinism

This is where the marketing and the measurements part ways, so it is worth being precise. A 2026 comparative study, Are LLM Agents the New RPA?, pitted agentic computer-use automation against traditional RPA across standard enterprise tasks: data entry, monitoring, and document extraction. The finding was not "agents win." It was that RPA outperformed the agentic approach on execution speed and reliability, especially in repetitive, stable environments, while the agentic approach significantly reduced development time and adapted more flexibly to dynamic interfaces. The authors also noted plainly that current agentic computer-use implementations are not yet production-ready for the hardest cases.

Read that again, because it sets realistic expectations. The agent's advantage is build speed and adaptability. The bot's advantage is raw speed and consistency. You are trading determinism for flexibility, and the right call depends entirely on which of those a given process needs more.

Determinism matters more than engineers new to LLMs usually expect. Businesses need consistent schemas, repeatable steps, and auditable records. LLMs produce probabilistic output: the same input can yield a different path on a different run. In a regulated workflow, non-determinism is a governance problem, because an agent can make a decision that is hard to explain, audit, or reverse. That is why "evaluation" rather than "capability" is the live blocker for autonomous agents in serious enterprise settings, and why a meaningful share of agentic projects are at risk of being shelved before production. The technology can act; proving it acts correctly, every time, is the unsolved part.

The practical takeaway is to shrink the surface where the model has free rein. Let the agent decide and classify, then hand the actual execution to a deterministic tool whose behavior you can pin down, log, and replay. Which is, conveniently, exactly the architecture the industry converged on.

The 2026 reality: hybrid, not replacement

The headline most vendors avoid saying loudly is that agentic RPA in 2026 is mostly a hybrid, not a wholesale replacement. RPA does not die. It moves down a layer and becomes the reliable executor that agents call. The agent reads an email, classifies the case, decides what to do, and invokes a specific deterministic bot that carries out the action the same way every time. The agent supplies cognitive flexibility; the bot supplies precise execution; each tier does what the other cannot.

This shows up in the products. UiPath repositioned from an RPA specialist to an agentic orchestration platform, with orchestration that coordinates AI agents, RPA robots, and human contributors together, plus tooling to build and govern agents alongside existing automations. Automation Anywhere frames agentic workflows as LLM-driven sequences orchestrated inside larger end-to-end automations. The common pattern across both is a two-layer model: agents as decision orchestrators, deterministic automation as the rules executor, humans at the high-stakes decision points, and governance underneath for traceability.

For a senior SDET or automation engineer, the design lesson is concrete. Do not ask one giant agent to both reason and reliably push buttons across every system. Decompose the process. Use the LLM for the genuinely ambiguous parts, interpretation, routing, exception handling, and route every deterministic step to a tool built for that step. The web steps in particular, logging in, filling a form, reading a status, are a natural fit for a purpose-built browser runner rather than a general OS agent guessing pixel coordinates.

Computer-use agents versus browser-scoped automation

There is a fork in agentic RPA worth understanding because it drives cost, speed, and reliability. Some agents perceive the world as raw pixels on a screen and act by synthesizing mouse and keyboard events at guessed coordinates. These are general computer-use agents, and their superpower is that they can, in principle, drive anything a human can see: any native desktop app, a remote-desktop window, a legacy tool from 2009 with no API. For true OS-level automation, that generality is the right tool, and nothing browser-scoped can match it.

Other agents stay inside the browser and act against the DOM, the structured document the browser already holds in memory, instead of a flat screenshot. They target elements the runtime knows about rather than pixel positions. This is narrower by design. It only works for web tasks. In exchange, it is cheaper, faster, more deterministic, and far friendlier to CI, because reasoning over structured elements takes fewer model calls than re-screenshotting after every action, and a 12-pixel layout shift does not silently break a DOM-based step the way it breaks a coordinate-based one.

This is exactly where BrowserBash sits, and the boundary is worth being blunt about. BrowserBash is browser-scoped. It automates web browsers; it is not a general computer-use system and does not control your operating system, desktop apps, or files. If your process must drive a thick-client ERP, a Citrix session, or a native accounting app, a general computer-use model or a classic RPA tool is the honest answer, not BrowserBash. But when the step lives in a browser, and an enormous share of modern back-office work does, a DOM-based runner gives you the deterministic, auditable execution layer the hybrid model needs without paying the pixel tax.

How the two compare for an RPA process

Dimension	General computer-use agent	Browser-scoped automation (BrowserBash)
Scope	Whole OS: any window, native app, remote desktop	Web only: pages, forms, web apps
Perception	Screenshot pixels (mostly)	DOM / structured elements
Reliability vs. layout drift	Sensitive to pixel shifts	Resilient; targets elements, not coordinates
Cost & latency per step	Higher; re-screenshot each action	Lower; fewer model calls
CI-friendliness	Harder; needs a desktop session	Designed for CI; NDJSON + exit codes
Best fit	Legacy desktop, no-API thick clients	The web slice of a process
Not a fit	Overkill/expensive for pure web tasks	Anything off the browser

Neither column is "better." A mature agentic RPA design often uses both: a computer-use agent or RPA bot for the desktop legacy step, and a browser runner for the web step, with the LLM orchestrating between them.

Where BrowserBash fits in an agentic RPA stack

BrowserBash is a free, open-source (Apache-2.0) command-line tool from The Testing Academy that turns a plain-English objective into actions a real Chrome or Chromium browser performs step by step, with no selectors to write. You describe the goal; an AI agent drives the browser and returns a verdict plus structured values you can route onward. In the three-layer model from earlier, BrowserBash is the deterministic-ish action layer for web steps, the hands, while your larger agent or workflow engine plays orchestrator.

A few properties make it a clean fit for the executor role in a hybrid pipeline:

It speaks machine. Agent mode (--agent) emits NDJSON and returns exit codes (0/1/2/3), so an orchestrating agent or a CI job can parse the result and branch on it instead of scraping logs. That is the contract you want when an LLM upstream is deciding what to do with the outcome.
It is DOM-based, not pixel-based. Steps target the structured page, which is what keeps web execution cheap, fast, and resilient to layout drift, the determinism you are trying to maximize in the action layer.
It runs your model, including free local ones. The default is Ollama-first with an auto chain (local Ollama, then ANTHROPIC_API_KEY, then OPENAI_API_KEY), so you can run a fully local model for a $0 inference bill with nothing leaving your machine, or point it at a hosted model when a step needs more horsepower. OpenRouter and Anthropic are supported too.
Steps are versionable. Write repeatable checks as Markdown *_test.md files with {{variables}} and masked secrets, run them in CI, and record runs (--record produces a .webm, a screenshot, and a trace) for the audit trail governance demands.

Here is what the web executor looks like in practice. A one-shot objective:

browserbash run "Log in with {{USERNAME}} and {{PASSWORD}}, open Billing, \
and read the current account balance"

The same step wired for an orchestrator or a pipeline, emitting NDJSON so the caller can branch on the verdict and structured values:

browserbash run "Open the invoices page, find any invoice marked overdue, \
and return its number and amount" --agent --record

And a versioned, reusable check stored alongside your code, run from CI with masked credentials:

browserbash testmd run flows/reconcile_payments_test.md

None of those invent capabilities. They are the run, --agent, --record, and testmd run paths the tool actually ships, with {{variables}} for parameterization. Install is a single command, covered in the CTA below, and the only prerequisites are Node 18+ and a local Chrome for the default provider.

An honest caveat about local models

Because the model story is Ollama-first, it is worth saying where small local models fall down. Tiny local models (roughly 8B parameters and under) get flaky on long, multi-step browser objectives: they lose the thread, repeat steps, or misjudge when a goal is complete. For anything beyond a few steps, the sweet spot is a Qwen3 or Llama 3.3 70B-class model, or a hosted model. Use the small local models for short, well-scoped steps and free iteration; reach for a bigger or hosted model when the web flow is long. Pretending a 3B model will reliably drive a ten-step checkout would be exactly the kind of overpromise this article is arguing against.

A practical pattern for putting it together

Suppose you are automating invoice intake across a desktop accounting client with no public API and a web vendor portal. A pure-RPA build would script both and break often. A pure single-agent build would be slow and hard to audit on the desktop side. The hybrid pattern threads the needle.

Reason at the top. An LLM agent reads the incoming email, classifies the invoice type, extracts the vendor and amount, and decides which downstream actions to fire. This is the ambiguous part, exactly what the model is good at.
Execute deterministically, per surface. For the legacy desktop client with no API, call a classic RPA bot or a computer-use agent, that surface needs OS-level control. For the web portal, call BrowserBash to log in, submit the invoice, and read back a confirmation number. The web step is structured, so it stays fast and stable.
Keep humans on the high-stakes branch. If the amount crosses a threshold or the agent's confidence is low, route to a person before anything commits. Human-in-the-loop at the decision points is a feature of the design, not a failure of it.
Record everything. Capture NDJSON output, exit codes, and recordings from the browser steps so you have an auditable trail. Governance and traceability are what move an agentic pipeline from a demo into production.

The point of the pattern is that the LLM does as little execution as possible and as much decision-making as the task genuinely needs. That keeps the probabilistic surface small and the deterministic surface large, the only known way to get agentic RPA reliable enough to trust. To see browser-step patterns spelled out, the tutorials and learn pages walk through objectives, agent mode, and Markdown tests, and the blog covers adjacent topics like CI integration and secret handling.

When to choose what

There is no universal winner, so decide per process.

Stick with traditional RPA when the workflow is high-volume, stable, and repetitive, the interface rarely changes, and you need millisecond-level speed and bit-for-bit consistency. Reconciling a fixed report into a system that has not changed its UI in three years does not need an LLM. Adding one would make it slower, costlier, and less predictable for zero benefit.

Reach for agentic RPA when the process involves ambiguity, interpreting unstructured input, choosing between branches, handling exceptions a script never anticipated, or when the interface changes often enough that maintaining a recorded bot is more expensive than letting an agent reason about intent. Faster development and adaptability are the wins here; just budget for the evaluation and governance work that non-determinism demands.

Choose a general computer-use agent or classic RPA when the task lives off the browser: native desktop apps, thick clients, remote-desktop sessions, no-API legacy systems. This is where OS-level control is mandatory and where BrowserBash is the wrong tool, full stop. Be honest about it.

Choose a browser-scoped runner like BrowserBash when the step lives in a browser and you want the action layer to be cheap, fast, deterministic, and CI-friendly. For the web slice of an agentic process, plain-English objectives, NDJSON output, recordings, local-model support, and versioned Markdown tests give you a reliable executor without the cost and fragility of driving a browser through screenshots. It is free and open source. The honest framing is "best executor for web steps," not "replacement for your whole RPA estate."

Who it is for: SDETs and automation engineers who already run web flows in CI, platform teams building the deterministic action layer under an orchestration agent, and anyone whose process is partly browser-based and is tired of selector scripts snapping on every redesign. If your automation never touches a browser, this is not your tool, and that is fine.

FAQ

What is agentic RPA?

Agentic RPA is automation where a large language model drives a process instead of replaying a fixed script. The agent reads a goal in plain language, looks at the current state, decides the next step, and adapts when things differ from expectations. It contrasts with traditional RPA, which follows hard-coded rules and breaks when an interface changes. In practice, most 2026 deployments are hybrids where the agent decides and deterministic tools execute.

How is agentic RPA different from traditional RPA?

Traditional RPA records an exact sequence of clicks against a stable interface; it is fast and consistent but fragile, and it fails when a field moves or a vendor redesigns a page. Agentic RPA puts an LLM in charge so it can reason about intent, tolerate small interface changes, and handle ambiguous inputs a rule could never cover. The trade is determinism for flexibility: agents adapt better and build faster, while rule-based bots still win on raw speed and repeatability. That is why the common pattern pairs an agent for decisions with deterministic tools for execution.

Does agentic RPA replace traditional RPA?

Mostly no, at least as of 2026. RPA is moving down a layer to become the reliable executor that agents call, rather than disappearing. The agent classifies and decides, then invokes a deterministic bot or tool that carries out the action the same way every time, with humans kept on the high-stakes branches. Major vendors have repositioned around this orchestration model rather than scrapping RPA outright.

Can BrowserBash do general computer-use automation across my desktop?

No. BrowserBash is browser-scoped: it automates web browsers and does not control your operating system, native desktop apps, or files. For true OS-level automation, like driving a thick-client ERP or a remote-desktop session, a general computer-use model or a classic RPA tool is the right fit. BrowserBash is the better choice for the web steps in a process, where its DOM-based approach is cheaper, faster, more deterministic, and friendlier to CI than driving a browser through screenshots.

Ready to build the web-execution layer of your agentic pipeline? Install the CLI and point it at any browser task.

npm install -g browserbash-cli

It is free and open source, and an account is optional. If you want the hosted dashboard for recordings and run history, sign up at browserbash.com/sign-up.