Automate Signup Flow Testing With an AI Agent

The signup flow is the first thing a new user touches and the last thing most teams test well. You can automate signup flow testing with a recorded script, but the moment a designer moves the password field, adds a "confirm email" step, or swaps the onboarding wizard for a new one, that recording goes red for reasons that have nothing to do with a real bug. This guide takes a different route: you describe the registration journey in English, an AI agent drives a real Chrome browser to carry it out, and when the form layout shifts the agent adapts instead of breaking. The tool is BrowserBash, a free, open-source CLI, and every command below is real and runnable.

There is a reason signup is the worst offender for flaky tests. It is rarely one screen. It is an email field, a password field with its own private list of rules, a "create account" button, a verification email you have to fetch a code from, and then a three- or four-step onboarding wizard that asks for a company name, a role, a team size, maybe an invite. Each of those is a place where the DOM can move, and a record-and-replay tool pins itself to every one of them. We will walk through how to test the whole chain by intent, where the AI approach genuinely wins over recorded steps from tools like BugBug or Ghost Inspector, and — just as important — where it does not.

Why recorded signup tests rot faster than the rest of your suite

Pick any end-to-end suite that has been alive for a year. The single most-edited test file is almost always the one that registers a new account. Onboarding is where product teams experiment hardest. They A/B test the field order, they add a "how did you hear about us?" dropdown, they move the marketing-consent checkbox, they introduce a passwordless option and then quietly remove it. Every one of those experiments is a markup change, and a recorded test treats a markup change as a failure.

The mechanism is simple. When you record a signup with a tool like BugBug or Ghost Inspector, it captures a selector for each element you interacted with — a CSS path, an XPath, an attribute match. That selector is a hard-coded map of the page as it looked the day you recorded. The instant the territory changes, the map is wrong. The test does not know that "Email address" and "Your email" are the same field; it knows it was told to click input[name="email"], and if that name became input[name="user_email"] during a refactor, it stops dead.

Page objects and centralized locators help — you fix the broken selector in one file instead of twelve — but they do not change the underlying contract. The test still encodes how to find each field rather than what signing up means. Change the "how" and the test forgets the "what." For a stable screen that almost never moves, that is a fine trade. For a signup flow that your growth team treats as a permanent experiment surface, it is a maintenance tax you pay every sprint.

The cost is not the bug you catch, it's the bug you stop trusting

There is a second-order cost that is easy to miss. When a signup test goes red twice a month for cosmetic reasons, engineers learn to glance at it and shrug. "Oh, that's just the onboarding test again, someone moved a field." The next time it goes red for a real reason — the verification email stopped sending, the password rule silently loosened, the wizard's "Next" button stopped advancing — nobody believes it. A flaky guard on your most important conversion funnel is worse than no guard, because it trains the team to ignore the alarm.

How an AI agent automates the signup flow differently

BrowserBash inverts the contract. Instead of recording clicks, you write an objective in plain English and hand it to an AI agent. The agent opens a real Chrome browser, reads the live page on every step the way a person would, decides which element matches your intent, acts, and then judges whether the goal was met. It returns a verdict — passed or failed — plus structured results you can read or pipe into CI.

Here is the whole thing for a basic registration:

browserbash run "Go to https://app.example.com/signup. Enter a unique email like qa+{{ts}}@example.com, type the password 'Str0ng-Pass!42' into both the password and confirm-password fields, accept the terms checkbox, and click Create account. Confirm you land on a page that says 'Verify your email' or shows an onboarding step."

No selectors. No data-testid. The agent figures out that the confirm-password field is the second password input, that the terms checkbox is the one near the "I agree" text, and that "Create account" is the primary button — by reading the rendered page, not by matching a recorded path. When a redesign renames the button to "Get started" or reorders the fields, the agent re-reads the page and proceeds. That is the core difference: a recorded step encodes a coordinate in a DOM that no longer exists; an intent encodes a goal that survives the redesign.

What "adapts when the layout changes" actually means

It helps to be precise here, because "self-healing" is an overused word. The agent is not patching a broken selector behind the scenes. It never had a selector to break. On each run it looks at the current page, identifies the element that satisfies your described intent, and interacts with it. If your signup form had three fields last week and four this week, the agent reads four fields and fills the ones your objective mentions. If the "Sign up" button moved from the top-right to a sticky footer, the agent finds it where it now lives. The resilience comes from re-deriving the plan against reality every time, not from caching a plan and hoping reality matches.

This is also why the failure mode is different. A recorded test fails closed on cosmetic change — any drift breaks it. An agent fails open on genuine breakage — it keeps going until the actual goal cannot be reached, then reports why. You still get real failures when registration is truly broken. You stop getting fake ones when a button merely moved.

Testing the three hard parts: verification, password rules, onboarding

A signup flow is really three sub-problems stitched together. Let's take them one at a time, because each has a quirk that trips up naive automation.

Email verification

Verification is the classic reason signup tests get abandoned. The flow leaves your app, lands in an inbox, and comes back with a code or a magic link. You cannot test the back half without reading mail. There are two honest patterns, and BrowserBash supports the saner one cleanly.

If your staging environment exposes a test inbox over the web — Mailosaur, MailHog, Mailpit, Ethereal, or a custom /dev/emails page — the agent can simply go read it. You describe the hop in English:

browserbash run "Register at https://app.example.com/signup with email qa+{{ts}}@inbox.test and password 'Str0ng-Pass!42'. Then open https://mailpit.staging.example.com, find the most recent email to that address, read the 6-digit verification code from it, return to the app's verification screen, type the code, and submit. Confirm the account is now verified."

The agent treats the inbox as just another page to read — which is exactly what it is. The honest caveat: if your verification arrives only via a real external provider (Gmail, Outlook) with no programmatic test inbox, that is a hard problem for any automation, AI or recorded, and you should provision a test-mail service rather than expect the agent to log into a personal mailbox. The win here is not that AI magically reads Gmail; it is that the natural-language step expresses "go get the code from the inbox" without you writing inbox-scraping glue code.

Password rule enforcement

Password rules are where you actually want negative tests, and where intent-based checks shine. You do not just want to confirm a strong password is accepted; you want to confirm a weak one is rejected with the right message. Recorded tools can do this, but you end up recording a separate brittle path per rule. With an agent you describe the expectation:

browserbash run "On https://app.example.com/signup, type the email qa+{{ts}}@example.com and the password 'password'. Confirm the form shows a validation error that the password is too weak or too short, and that the Create account button is disabled or does not proceed. The signup must NOT succeed."

Notice the objective tells the agent what should fail. The agent reads the live validation state — the inline error text, the disabled button — and renders a verdict on whether the rule held. Vary the password across runs (too short, no number, no symbol, a known breached password if your form checks HaveIBeenPwned-style) and you have a rule matrix expressed as plain sentences instead of a wall of recorded steps. When the rule copy changes from "Too weak" to "Add a number," the agent still understands it is a validation error and passes; a recorded assertion pinned to the old string would fail.

Onboarding wizards

The multi-step wizard after account creation is where recorded tools suffer most, because a wizard is the most-experimented-with surface in the product. Steps get added, reordered, made optional, gated behind a feature flag. An agent walks the wizard by goal:

browserbash run "After creating an account on https://app.example.com, complete the onboarding wizard: enter the company name 'Acme QA', choose the role 'Engineer', set team size to '11-50', skip any optional invite-teammates step, and click through until you reach the main dashboard. Confirm the dashboard header is visible and the onboarding wizard is gone."

If the wizard grows from three steps to four, the agent keeps clicking "Next" toward the dashboard goal instead of stopping at step three the way a recording would. If an optional step appears, your objective already told it to skip optional steps. You are describing the destination, not the turn-by-turn directions, so a new intersection does not strand you.

A committable signup test you keep in the repo

One-off run commands are great for exploration, but you want your signup checks in version control next to the code. BrowserBash supports Markdown tests — plain *_test.md files where each list item is a step. They take {{variables}} for templating, support @import so you can compose shared setup, and mask any secret-marked variable as ***** in every log line. After a run they write a human-readable Result.md.

# signup_smoke_test.md
- Go to {{base_url}}/signup
- Enter the email qa+{{ts}}@example.com
- Type {{password}} into the password and confirm-password fields
- Accept the terms and conditions checkbox
- Click "Create account"
- Confirm the page shows "Verify your email"

browserbash testmd run ./signup_smoke_test.md \
  --var base_url=https://app.example.com \
  --secret password='Str0ng-Pass!42'

Because password is passed as a secret, it shows up as ***** in the logs, the Result.md, and any CI output — so the credential you type during registration never leaks into shell history or an archived build log. That is a small thing that matters a lot when your signup test runs on every pull request and those logs live for months.

If you want one signup definition reused across staging and production, put the shared steps in a base file and @import it, then template the base_url. You write the journey once and run it everywhere, which is exactly the kind of reuse recorded tools make awkward.

BrowserBash vs recorded signup tests: an honest comparison

Record-and-replay tools are genuinely good products, and for some teams they are the right call. Here is a fair side-by-side. Where a competitor's pricing or internals are not publicly documented, the table says so rather than guessing.

Dimension	BrowserBash (AI agent)	BugBug	Ghost Inspector
Authoring model	Plain-English objective	Record clicks in browser/extension	Record clicks in browser/extension
Reacts to layout change	Re-reads live page each run; adapts	Selector-based; editable, can break on drift	Selector-based; editable, can break on drift
Selectors required	None	Captured, editable by you	Captured, editable by you
Where it runs	Local Chrome, CDP, Browserbase, LambdaTest, BrowserStack	Cloud / their runner (as of 2026)	Cloud SaaS (as of 2026)
Pricing	Free, open-source (Apache-2.0)	Has free tier + paid plans; exact tiers not specified here	Paid SaaS; exact tiers not specified here
Local / no-account run	Yes, no account, $0 on local models	Account required	Account required
CI integration	NDJSON `--agent` mode, exit codes	CI/CLI integrations available	API + scheduling, CI integrations
Determinism	Lower — a model can vary run to run	Higher — replays exact recorded steps	Higher — replays exact recorded steps
Best at	Flows that change often; negative checks by intent	Stable flows; visual step editing	Stable flows; scheduled monitoring

Read that last "Determinism" row carefully, because it is the real trade. A recorded test does the exact same thing every run, which is a feature: it is predictable, and a diff in behavior is signal. An AI agent re-plans each run, which is what makes it resilient to layout churn — but it also means two runs are not byte-identical, and a capable model is doing genuine reasoning that can occasionally pick a wrong element on a truly ambiguous page. If your signup form is frozen and you value bit-for-bit reproducibility above all, a recorded tool is arguably the better fit, and you should use one. BrowserBash earns its place when the flow moves often enough that selector maintenance is eating your week.

Where BugBug and Ghost Inspector are the better choice

I want to be plain about this. If your team is non-technical and wants a point-and-click recorder with a polished visual step editor and hosted scheduling, BugBug and Ghost Inspector are mature, well-built tools designed exactly for that, and a CLI is not what you want. If you need a managed cloud service that runs monitors on a schedule and pages you when a flow breaks, those products ship that out of the box. BrowserBash is a CLI you run yourself; the optional dashboard is opt-in, not a hosted monitoring service. Match the tool to the team.

Running it in CI without parsing prose

The reason a CLI matters for signup testing is that registration is a pull-request gate. You want it to fail the build when account creation breaks. BrowserBash has an agent mode built for exactly this: --agent emits NDJSON — one JSON event per line on stdout — so your pipeline reads structured events instead of scraping human prose. Exit codes are unambiguous: 0 passed, 1 failed, 2 error, 3 timeout.

browserbash run "Register a new account at https://app.example.com/signup and confirm the verification screen appears" \
  --agent --headless --record --upload

--headless runs without a visible window for CI. --record captures a screenshot and a full .webm session video on any engine — so when a signup run fails at 3 a.m., you have a video of exactly what the agent saw, not a stack trace. On the builtin engine, --record additionally captures a Playwright trace you can open in the trace viewer. --upload (which requires a one-time browserbash connect) pushes the run to the free, opt-in cloud dashboard for run history and per-run replay; uploaded free runs are kept 15 days. Prefer everything local? browserbash dashboard gives you a fully local dashboard with no upload at all.

For deeper CI patterns and engine options, the learn section walks through the NDJSON event shape and exit-code contract, and the features page lists what each engine and provider supports.

Choosing a model so the signup agent stays reliable

This is the part most write-ups skip, and it is the part that decides whether your signup tests actually hold. BrowserBash is Ollama-first: it defaults to free local models, needs no API keys, and nothing leaves your machine. It auto-resolves in order — local Ollama, then ANTHROPIC_API_KEY, then OPENROUTER_API_KEY — so you can run a genuine $0 model bill on local hardware.

The honest caveat, and it matters most for signup: very small local models (roughly 8B parameters and under) get flaky on long, multi-step objectives. A four-step onboarding wizard plus an email-verification hop is exactly the kind of long chain that exposes a small model — it can lose the thread between steps. The sweet spot for these flows is a mid-size local model in the Qwen3 or Llama 3.3 70B class, or a capable hosted model. OpenRouter offers genuinely free hosted models such as openai/gpt-oss-120b:free, and you can bring your own Anthropic Claude key for the hardest flows. The rule of thumb: a quick "does signup load and accept a strong password" check is fine on a small model; a full register-verify-onboard chain wants the mid-size-or-better tier.

A practical setup

For most teams, run local during development for the fast feedback loop and the $0 bill, then point CI at a capable hosted model with OPENROUTER_API_KEY or ANTHROPIC_API_KEY for the full multi-step signup chain where reliability matters most. You get free iteration locally and a dependable gate in the pipeline. If you need the registration to run on a specific browser matrix, switch where the browser runs with one flag:

browserbash run "Sign up with a new account and verify the email, then reach the dashboard" \
  --provider lambdatest --record

The same English objective runs on your local Chrome, any CDP endpoint, Browserbase, LambdaTest, or BrowserStack — you change --provider, not the test.

A realistic end-to-end signup scenario

To make this concrete, here is the kind of full flow BrowserBash handles in one objective — the registration analog of the store-checkout example it ships with (log in, add an item to the cart, complete checkout, verify "Thank you for your order!"):

browserbash run "Go to https://app.example.com/signup. Register with email qa+{{ts}}@inbox.test and password 'Str0ng-Pass!42' in both password fields, accept terms, and submit. Open https://mailpit.staging.example.com, read the verification code from the newest email to that address, return and submit it. Then complete onboarding: company 'Acme QA', role 'Engineer', team size '11-50', skip optional steps, and continue to the dashboard. Confirm the dashboard is visible and report PASS only if a new account exists, is verified, and onboarding is complete." --record

That single sentence covers all three hard parts — verification, the password rules implicitly (a strong password is supplied), and the wizard — and the --record flag leaves you a video and screenshot to review. When the growth team reshuffles the wizard next month, you do not touch this objective. The agent reads the new layout and walks the new path to the same destination. Compare that to re-recording every step in a replay tool each time a field moves, and the maintenance math is obvious for a flow that changes this often. For more worked examples, the BrowserBash blog and the npm package page are good next stops, and the GitHub repo has the source and issues.

When to choose the AI agent and when not to

Reach for BrowserBash to automate signup flow testing when your registration and onboarding screens change often, when you want negative password-rule checks expressed as plain sentences, when verification routes through a web-readable test inbox, and when you want a free, local, no-account way to gate registration in CI. It removes the selector-maintenance tax that makes recorded signup tests rot.

Stay with — or add — a record-and-replay tool when your signup is frozen and you need bit-identical reproducibility, when your team prefers a visual step editor over a CLI, or when you need a hosted monitoring service that schedules runs and alerts you. And keep your unit and contract tests: an agent driving a browser confirms the journey works end to end; it does not replace a fast unit test on your password-validation function. The strongest setup uses both layers — cheap deterministic checks underneath, an adaptive agent on top guarding the full flow. Worth comparing your needs against the pricing page and a real case study before you commit.

FAQ

How do you automate signup flow testing without writing selectors?

You describe the registration journey as a plain-English objective and hand it to an AI agent, which opens a real browser, reads the live page on each step, and decides which field and button match your intent. There are no CSS paths or XPaths to write or maintain. Because the agent re-reads the page every run, it adapts when the form layout changes instead of breaking the way a recorded selector would.

Can an AI agent test the email verification step in signup?

Yes, if your staging environment exposes a web-readable test inbox such as Mailpit, MailHog, or Mailosaur. You describe the hop in English — register, open the inbox page, read the code from the newest email, return and submit it — and the agent treats the inbox as just another page to read. If verification only arrives in a real external mailbox with no test inbox, that is hard for any automation, and you should provision a test-mail service first.

Is BrowserBash better than BugBug or Ghost Inspector for signup tests?

It depends on how often your signup flow changes. BrowserBash wins when the flow moves often, because the agent adapts to layout changes instead of breaking recorded steps, and it is free and open-source. BugBug and Ghost Inspector are mature record-and-replay tools that give you deterministic, bit-identical replays and a visual editor, which is the better fit for a frozen flow or a non-technical team that wants hosted scheduling.

How do I keep the test password out of my CI logs?

Pass the password as a secret-marked variable in a Markdown test, for example using the --secret flag with browserbash testmd run. BrowserBash masks any secret-marked value as ***** in every log line, in the generated Result.md, and in CI output. The credential the agent types during registration never appears in shell history or an archived build log.

Ready to automate signup flow testing the resilient way? Install with npm install -g browserbash-cli and write your first plain-English registration test in minutes. No account is needed to run locally — though if you want run history and video replay, the optional free dashboard is one sign-up away.