From Checkly to AI Browser Checks in Plain English

Name: BrowserBash
Author: The Testing Academy

If you have ever opened a Checkly browser check at 2 a.m. because the alert fired, you know the migration question is rarely about features. It is about ownership, cost, and how much Playwright you want to keep hand-writing. This guide is a working checkly alternative ai browser monitoring walkthrough: how to take the Playwright-based synthetic checks you already run in Checkly and re-express them as plain-English objectives that an AI agent drives in a real Chrome browser, scheduled from your own CI. I will be honest about the trade-off too. Checkly's hosted scheduling and global probe network are things BrowserBash does not try to replace, and for some teams that hosted layer is the whole reason to stay.

I have maintained scripted browser monitors for years, so this is not a checkmark-table piece. It is a comparison of what these two approaches feel like to operate, and a concrete path for moving a check across without throwing away the intent behind it.

What Checkly actually does well

Checkly is a synthetic and API monitoring platform. You write browser checks as Playwright scripts, commit them (often via the Checkly CLI and "monitoring as code"), and Checkly runs them on a schedule from its own globally distributed locations, then alerts you through the usual channels when a check fails. That is the core loop: a real browser flow, executed on an interval from outside your infrastructure, with paging when the journey breaks.

The parts Checkly genuinely nails are the parts that are tedious to build yourself. Hosted scheduling means you are not babysitting a cron host. Multi-region execution means a check can run from several geographies so you catch a CDN or DNS problem that only shows up in one part of the world. The alerting integrations, dashboards, status pages, and retention are all managed for you. If your job is "page me within minutes when checkout is down, from three continents, without me running any infrastructure," Checkly is built precisely for that and does it well.

The friction shows up in two places. First, every browser check is a Playwright script you own: selectors, waits, page objects, and the steady maintenance tax when the UI shifts. Second, pricing scales with check volume and run frequency. As of 2026 the exact plan numbers are best read straight from Checkly's pricing page rather than quoted from memory, because vendors change tiers often and I will not invent figures. The structural point stands regardless of the current sticker: more checks, run more often, cost more, and the scripts are yours to keep green.

Where an AI browser agent changes the equation

BrowserBash takes a different starting point. Instead of writing a Playwright script, you write a plain-English objective and an AI agent drives a real Chrome or Chromium browser step by step. There are no selectors and no page objects. The agent reads the page, decides what to click and type, performs the steps, and returns a pass/fail verdict plus structured results you can act on.

For a synthetic monitoring mindset, that reframing matters more than it first appears. A Checkly browser check encodes a journey as imperative steps tied to the current DOM. A BrowserBash objective encodes the intent of the journey — "log in, add the blue running shoes to the cart, complete checkout, and verify the order confirmation appears" — and lets the agent figure out the mechanics at runtime. When a button's label changes or a field moves, an objective often still passes where a brittle selector would have thrown.

BrowserBash is free and open source under Apache-2.0, built by The Testing Academy. The model story is the other half of why it fits a cost-sensitive monitoring use case. It is Ollama-first: by default it uses free local models with no API keys, and nothing leaves your machine. It auto-resolves a local Ollama install first, then ANTHROPIC_API_KEY, then OPENROUTER_API_KEY, so you can run a genuinely $0 model bill on local models, or reach for a capable hosted model on the hard flows. You can read more about how the agent drives a browser on the BrowserBash features page.

Here is the honest caveat, stated plainly because it affects which checks you migrate first. Very small local models (roughly 8B parameters and under) can get flaky on long, multi-step objectives — they lose the thread, repeat a step, or call the run done early. The sweet spot is a mid-size local model in the Qwen3 or Llama 3.3 70B class, or a capable hosted model when the flow is genuinely hard. Match the model to the difficulty of the check and you avoid most of the disappointment.

Checkly vs BrowserBash at a glance

Dimension	Checkly	BrowserBash
Check authoring	Playwright scripts (selectors, page objects)	Plain-English objectives, no selectors
Scheduling	Hosted, managed by Checkly	Your CI / cron (you own the scheduler)
Run locations	Global, multi-region probe network	Wherever your CI or machine runs
Cost model	Paid plans scale with checks and frequency	Free, open source (Apache-2.0); $0 on local models
Where the browser runs	Checkly infrastructure	Local Chrome by default; CDP, Browserbase, LambdaTest, BrowserStack via `--provider`
Alerting / status pages	Built in, managed	Bring your own (CI notifications, exit codes)
Evidence	Hosted run history, traces	Screenshots, `.webm` video, optional free dashboard
Maintenance feel	Keep scripts green as UI changes	Update the objective's intent, not selectors
Account required	Yes	No (optional free dashboard is opt-in)

The table makes the split clear. Checkly's column is heaviest on the hosted side — scheduling, regions, alerting, status pages. BrowserBash's column is heaviest on the authoring and cost side — plain-English checks, no infrastructure to rent, and a real path to a zero-dollar model bill. Migration is mostly about deciding which of those columns matters more for a given check.

Translating a Checkly browser check into a BrowserBash objective

Start with a real Checkly browser check. A common one is a login-to-checkout journey: navigate to the store, fill the email and password fields, click sign in, wait for the dashboard, add an item, go to checkout, fill payment, submit, and assert the confirmation text. In Playwright that is a few dozen lines of page.goto, page.getByRole, page.fill, expect(...).toBeVisible(), plus the waits you added after the flaky parts.

The BrowserBash version collapses that into the intent:

browserbash run "Go to https://shop.example.com, sign in with the test account, \
add the first product to the cart, complete checkout with the saved test card, \
and verify the page shows 'Thank you for your order!'"

The agent opens a real browser, reads each page, and performs the steps. You get a verdict and structured results back. There is no selector to update when the "Sign in" button becomes "Log in", because you never named the button by selector in the first place — you described the goal.

Keep secrets and data out of the prompt with Markdown tests

For anything you commit and run repeatedly, the better authoring unit is a Markdown test rather than a one-off prompt. BrowserBash supports committable *_test.md files where each list item is a step, with @import composition for shared setup and {{variables}} templating. Secret-marked variables are masked as ***** in every log line, which matters when a synthetic check has to log in with real credentials. After each run it writes a human-readable Result.md, which is a clean artifact to attach to a CI job or a paging incident.

A checkout_test.md might read like this:

browserbash testmd run ./checkout_test.md \
  --var email='{{TEST_EMAIL}}' \
  --var password='{{secret:TEST_PASSWORD}}'

The secret: marker means the password never appears in logs. This is the closest analog to how you would store Checkly check secrets, except the test file lives in your repo right next to the code it monitors — the same "monitoring as code" instinct, expressed in plain English instead of Playwright. If you are coming from Checkly's CLI-driven, repo-committed checks, this will feel familiar in spirit. The learn section walks through the Markdown test format in more depth.

Scheduling: the honest gap, and how to close it

This is the part where Checkly is straightforwardly the stronger product, and I am not going to pretend otherwise. Checkly is a scheduler with a global probe network. It runs your checks on an interval from multiple regions and pages you when they fail, all managed. BrowserBash does not host anything and does not run your checks for you on a timer. It is a CLI. If you need someone else to own the cron host and the multi-region execution, that hosted layer is real value and a legitimate reason to keep paying for it.

What BrowserBash gives you instead is a check that runs cleanly in your CI, which is often where you already have scheduling. Agent mode is the bridge. The --agent flag emits NDJSON — one JSON event per line on stdout — with no prose to parse. Exit codes are unambiguous: 0 passed, 1 failed, 2 error, 3 timeout. That is exactly what a CI step or a paging script wants.

browserbash run "Sign in and verify the dashboard loads with today's revenue tile" \
  --agent --headless

Wire that into a scheduled GitHub Actions workflow, a Jenkins timer, or a GitLab scheduled pipeline, and you have a synthetic check on an interval — using infrastructure you already pay for, with no per-check meter. The trade is explicit. You give up Checkly's managed global probes and gain a free, repo-owned check whose model bill can be zero. For a single-region smoke test that runs every fifteen minutes from your CI region, that trade is usually a clear win. For a "must alert within two minutes from three continents" SLA check, Checkly's hosted network is the better tool, and the grown-up move is to keep that handful of checks where they belong.

A pragmatic split

Most teams I have worked with do not migrate everything. They split by criticality. The genuinely critical, externally-facing, multi-region paging checks — the ones tied to an SLA — stay on a hosted platform like Checkly. The long tail of "did this flow break in the last deploy" checks, the ones that really want to live next to the code and run in CI on every merge or on a cheap schedule, move to BrowserBash. You stop paying per-check for the long tail, and you keep the hosted guarantees exactly where they earn their cost.

Where the browser runs, and why that flexibility helps monitoring

By default BrowserBash drives your local Chrome, which is perfect for development and for CI runners that already have Chromium. But monitoring sometimes needs more than the box your CI happens to give you, and that is what the --provider flag covers. It switches where the browser actually runs without changing your objective at all.

You can point at any DevTools endpoint with cdp, or run on a cloud browser grid through browserbase, lambdatest, or browserstack. That last group matters for synthetic monitoring specifically, because it is how you get coverage on browser and OS combinations your CI runner does not have, or how you reach a target from a different network vantage point.

browserbash run "Open the marketing site and confirm the pricing page renders three plan tiers" \
  --provider lambdatest --agent

The objective is identical to the local version. Only the execution surface changed. That is a deliberate design choice: the check you wrote on your laptop is the same check that runs in CI and the same check that runs on a cloud grid. There is also a choice of engines under the hood — stagehand (the default, MIT-licensed, from Browserbase) and builtin (an in-repo Anthropic tool-use loop) — but for a straight migration you can leave the default in place and not think about it.

Evidence and debugging when a check fails

A monitoring check is only as useful as the evidence it leaves when it breaks. Checkly's strength here is hosted run history and traces you can scrub through in its UI. BrowserBash's answer is local-first and free.

The --record flag captures a screenshot and a full .webm session video — recorded via ffmpeg — on any engine, so when a scheduled check fails at 3 a.m. you have a video of exactly what the agent saw. On the builtin engine you additionally get a Playwright trace you can open in the trace viewer, which is the same debugging surface a Checkly Playwright check would give you.

browserbash testmd run ./checkout_test.md --record --agent

If you want a dashboard view rather than loose files, you have two options, both free. browserbash dashboard runs a fully local dashboard on your own machine. Or you can opt in to the free cloud dashboard with browserbash connect and add --upload to a run; that gives you run history, video recordings, and per-run replay in a hosted view. The cloud dashboard is strictly opt-in — no account is needed to run BrowserBash at all — and free uploaded runs are kept for 15 days. For a team migrating off Checkly that wants some hosted visibility without the hosted bill, the opt-in dashboard is a reasonable middle ground. You can see the plan details on the pricing page.

A realistic migration plan

If you are actually moving checks, here is the sequence I would follow rather than a big-bang rewrite.

1. Inventory and rank your existing checks

List every Checkly browser check and rank it by two axes: how critical it is (does it page someone, is it tied to an SLA) and how hard the flow is (number of steps, dynamic content, auth). Critical-and-hard checks are the ones to migrate last or leave on Checkly. Low-criticality, medium-difficulty checks are your migration on-ramp.

2. Re-express the easy checks as objectives

Take the simplest few — a homepage smoke check, a login check, a pricing-page render check — and write them as plain-English objectives. Run them locally with your default model first. If a mid-size local model handles them cleanly, you have a $0 monitor. If a particular flow is flaky on a small model, that is your signal to either bump to a larger local model or use a hosted model for that one check.

3. Commit them as Markdown tests

Convert the objectives that survive into *_test.md files with {{variables}} for data and secret: markers for credentials. Use @import to share login setup across checks so you are not repeating the sign-in steps in every file. Now your monitors are versioned next to your application code.

4. Schedule them in CI with agent mode

Add a scheduled CI workflow that runs the Markdown tests with --agent. Map exit code 1 to a failed check and wire your existing CI notifications to it. Add --record so every failure leaves video evidence. This is where you replace Checkly's hosted scheduling with your own, for the checks where that trade makes sense.

5. Keep the SLA checks where they belong

Be disciplined about not migrating the multi-region, must-page checks unless you have a real plan for global execution and alerting. Honesty about this boundary is what keeps the migration credible. The goal is a smaller, cheaper Checkly footprint plus a fast-growing set of free, repo-owned AI checks — not a religious all-or-nothing move. There is a worked example of an end-to-end flow in the case study if you want to see one all the way through.

When to choose each tool

Choose Checkly when your primary need is hosted, multi-region synthetic monitoring with managed alerting and status pages, and you are willing to write and maintain Playwright scripts to get it. If your checks carry an SLA, must fire from several geographies, and you do not want to operate any scheduling infrastructure, Checkly is doing real work you would otherwise have to build. That is not a consolation prize — it is the right answer for that requirement.

Choose BrowserBash when you want code-light, plain-English checks that live in your repo, run in the CI you already pay for, and can hit a $0 model bill on local models. It shines for the long tail of per-deploy and per-merge flow checks, for teams who would rather describe intent than maintain selectors, and for anyone who wants their browser checks to be free and open source. The natural-language authoring also lowers the barrier for non-SDETs to contribute a check, which a Playwright-only workflow rarely does.

Most teams land on both: a lean set of critical hosted checks, plus a growing library of free AI-driven checks in CI. The migration is not about replacing one with the other. It is about moving each check to the column where it is cheapest and easiest to own.

FAQ

Is BrowserBash a drop-in replacement for Checkly?

No, and it does not claim to be. BrowserBash replaces the authoring and execution of browser checks with plain-English objectives, but it is a CLI, not a hosted scheduler. Checkly's managed multi-region scheduling, alerting, and status pages have no direct equivalent in BrowserBash, so you provide scheduling through your own CI. For many checks that is a fine trade; for SLA-bound, multi-geo paging checks, Checkly is still the better fit.

Can I run AI browser monitoring for free?

Yes. BrowserBash is free and open source under Apache-2.0, and it is Ollama-first, so it defaults to local models with no API keys and nothing leaving your machine. That means a genuine $0 model bill when you run local models. The honest caveat is that very small local models can be unreliable on long flows, so use a mid-size local model or a capable hosted model for the hard checks.

How do I keep login credentials out of my monitoring logs?

Use BrowserBash Markdown tests with variables and the secret: marker, for example --var password='{{secret:TEST_PASSWORD}}'. Secret-marked variables are masked as ***** in every log line, so credentials never appear in console output, the written Result.md, or uploaded run records. This is the recommended pattern for any committed check that has to authenticate.

Will an AI browser check be as reliable as a Playwright script?

It depends on the flow and the model. For short and medium journeys, an objective is often more resilient than a selector-based script because it targets intent rather than a specific DOM node, so small UI changes do not break it. For long multi-step flows on a small local model, reliability drops, which is why the guidance is to use a mid-size or hosted model there and to attach --record so every failure leaves video and trace evidence you can review.

Ready to move your first check across? Install with npm install -g browserbash-cli, re-express one Checkly browser check as a plain-English objective, and run it locally for free. When you want hosted run history and replay, the free dashboard is one opt-in step away — sign up at browserbash.com/sign-up (an account is optional, and you never need one just to run a check).