BrowserBash ← back to site

The math: what does 500 browser tests a day actually cost?

BrowserBash drives a real browser from plain English. The browser is basically free — the cost is the LLM. Here's the monthly bill for the same suite run daily, across the big APIs vs cheap open models vs self-hosting. Prices are live from OpenRouter; token sizes come from real BrowserBash runs.

Your assumptions

Defaults are measured from real BrowserBash runs (a TTACart login + e2e checkout): ~60k input + ~3k output tokens per test. Edit anything to recompute.

Tests / day

Input tokens / test

Output tokens / test

Days / month

Monthly cost by model

Sorted most→least expensive. Savings are vs Claude Sonnet (a typical "good default"). Self-hosted = electricity only after hardware.

★ popular picks — DeepSeek, open / free models & local self-hosting (cheap + community favorites) ✓ tested to drive the agent

Model	$/M in	$/M out	$ / test	$ / day	Monthly	vs Claude Sonnet

Grounded in real runs

These aren't guesses. This session we ran the TTACart suite through BrowserBash on cheap models:

DeepSeek V4 Flash — login passed (112s) and the full 20-step checkout passed first try in 88s, using batched form-fill. in $0.098/M · out $0.196/M · 1M context
gpt-oss-120b (free) — passed too, but slower and needed a retry on checkout.
Tencent HY3 Preview — cheapest on price, but failed: it returned no usable tool calls and timed out. Cheapest ≠ usable.

The lesson: a low price (or a "supports tools" flag) doesn't mean a model can drive an agent. Always test before you trust the row — the "drives the agent" badges above are from real BrowserBash runs.

Self-hosted Gemma on a 128 GB Mac

A Mac with an M4 Max + 128 GB unified memory can run Gemma-class models (e.g. Gemma 3 27B) locally with Ollama. After buying the box, the LLM is electricity only — ~$0 per token.

Hardware: ~$4,700 one-time (amortizes to ~$130/mo over 3 yrs)
Power: ~$10–20/mo running near-continuously
Marginal cost per test ≈ $0

Caveat: one Mac serves the model serially — realistic throughput is a few hundred agent runs/day, so ~500/day is doable on a single box. Scaling to thousands/day wants a cloud model or a small GPU fleet.

The real winner: caching

You're running the same suite every day. A browser agent shouldn't re-pay the LLM for an identical flow.

First run: the agent figures out the steps, pays the LLM, records the concrete actions.
Daily reruns: replay the cached steps with zero LLM calls; only re-plan when the page actually changes.

For a fixed 5k/day suite, caching takes the monthly number below toward ≈ $0 regardless of model. It's on the BrowserBash roadmap.

* free tier: gpt-oss, Qwen and Llama have :free variants on OpenRouter — $0 but rate-limited (fine for dev/low volume, not sustained thousands/day). Numbers are rough estimates (±2×). Browser-agent cost is dominated by re-sending the page accessibility tree every step, so your real token count depends on page size and flow length. Prices fetched live from OpenRouter and change over time. This page is for planning, not a quote.