Auto-wait handles late elements by polling the page until the target is actually ready to be acted on, up to a fixed ceiling, instead of guessing how long to pause. BrowserBash leans on Playwright's built-in auto-wait for this: before every click, type, or assertion, the underlying engine checks that the element exists, is visible, is stable, and is enabled, and it keeps re-checking for up to 15 seconds. If the button renders after 300 milliseconds, the action fires at roughly 300 milliseconds. If it takes 4 seconds, the action waits 4 seconds and then fires. You never write sleep 2 or a fixed timeout in a step, because the wait is adaptive: it lasts exactly as long as the element needs and no longer, bounded by that 15-second cap so a genuinely broken page fails fast instead of hanging your run forever.
That single mechanism removes the most common source of flakiness in browser tests. The rest of this article explains why fixed sleeps are the wrong tool, how the 15-second ceiling bounds both the happy path and the failure path, what auto-wait actually checks before it lets an action proceed, how to write BrowserBash tests that take advantage of it, and the honest limits: the cases where content genuinely arrives later than the ceiling and what you do about them.
Why fixed sleeps cause flaky tests
A manual sleep is a bet against a number you cannot know. When you write "wait 2 seconds for the modal," you are encoding an assumption about render time that was true on your laptop, on that day, against that build. The bet loses in two directions, and both are bad.
If the sleep is too short, the test is flaky. CI runners are slower than your machine, the network jitters, a cold cache or a heavy first paint pushes the modal past your 2-second guess, and the step fails even though the application is perfectly healthy. You get a red build that is not a real bug, you re-run it, it passes, and trust in the suite erodes. Flake from short sleeps is the classic "works on my machine" failure mode dressed up as a test result.
If the sleep is too long, the test is slow. To defend against the slowest case you pad every sleep, and now a suite of 80 steps carries 80 fixed pauses that mostly wait on nothing. A flow that could finish in 20 seconds takes three minutes, and that cost is paid on every single run, forever, including the thousands of runs where the element was ready instantly.
The deeper problem is that a sleep waits on the clock, not on the page. It has no idea whether the element appeared. It will happily "succeed" against a frozen page and "fail" against a healthy-but-slow one. Auto-wait inverts this: it waits on the actual condition you care about (is this element ready to be acted on?) and the clock is only a safety bound. That is why the right number of manual sleeps in a BrowserBash test is zero. We cover the broader flake picture in reduce flaky end-to-end tests, but for late elements specifically, deleting the sleep and trusting auto-wait is the whole move.
What auto-wait actually checks
Auto-wait is not a single "does it exist" check. Before Playwright performs an action on an element, it runs a set of actionability checks and retries them until they all pass or the timeout fires. The important ones for late-rendering UI are:
- Attached: the element is present in the DOM. This is the one that handles content rendered after a fetch, a route change, or a client-side framework finishing its work.
- Visible: the element has a non-empty bounding box and is not hidden by
display: none,visibility: hidden, or zero opacity. This handles fade-in animations and skeleton-to-content swaps. - Stable: the element is not still moving. If a panel is sliding into place, the wait holds until its position settles between animation frames, so the click does not land where the button used to be.
- Enabled: for interactive elements, the control is not disabled. This is what makes "wait for the submit button to become clickable after the form validates" just work without a sleep.
- Receives events: the element is the actual hit target at the point you would click, not covered by an overlay or a spinner.
Because BrowserBash drives a real Chrome through this engine, every action your test takes inherits all of these checks for free. When the agent decides to click a "Continue" button, it is not racing the page. It is asking the browser to click as soon as that specific button is attached, visible, stable, enabled, and unobstructed, and to keep trying until it is, up to the ceiling.
The 15-second ceiling: bounding both directions
The ceiling matters as much as the waiting. Auto-wait without a cap would mean a test that hangs indefinitely on a page that never renders the element, which is its own kind of flake (the build that "runs for 45 minutes and then someone kills it"). BrowserBash sets the auto-wait ceiling at 15 seconds per action.
On the happy path, the ceiling is invisible. An element that renders in 800 milliseconds is acted on at 800 milliseconds. The test only ever consumes the time the page actually needs, so fast pages run fast. You are not paying 15 seconds; you are paying "however long this element took," and the 15 is just the limit.
On the failure path, the ceiling is what makes failures clean. If a button genuinely never appears (the API errored, the feature flag is off, the deploy is broken), the action waits the full 15 seconds, gives up, and the step fails with a clear timeout rather than hanging. In CI that maps to a deterministic exit: a failed run returns exit code 1 (a clean fail), or exit code 3 specifically when the run hits a timeout, which lets your pipeline tell "the test asserted something false" apart from "the page never got there." That distinction is hard to get from hand-rolled sleeps, which can only ever fail as a generic assertion error.
So the ceiling does two jobs at once. It bounds the patience of every action so a broken page cannot stall the suite, and it gives the failure a name so your pipeline can react correctly. The trade is that 15 seconds is a real limit, and content that legitimately takes longer needs a different approach, which we get to in the honest-limits section.
Writing tests that lean on auto-wait
The practical payoff is that your tests describe intent, not timing. A BrowserBash test is a Markdown *_test.md file: a title, a list of steps in plain English, optional {{variables}}, and @import for composition. There is nowhere to put a sleep, and you do not want one. Here is a login-then-dashboard flow where the dashboard widgets render after an async fetch:
# Dashboard loads after login
1. Go to {{base_url}}/login
2. Fill the email field with {{user_email}}
3. Fill the password field with {{user_password}}
4. Click the "Sign in" button
5. Confirm the heading "Your dashboard" is visible
6. Confirm the "Recent activity" panel shows at least one row
Step 5 and step 6 both depend on content that does not exist at the moment step 4 fires. There is no wait anywhere. The agent locates each target through the accessibility tree (roles, accessible names, states) plus the DOM, and the auto-wait underneath holds each action until that heading and that panel are attached, visible, and stable. If the fetch is slow today, the steps wait. If it is fast, they do not. You run it like this:
browserbash testmd run ./dashboard_test.md
Or for a one-off objective without a file:
browserbash run "log in and confirm the dashboard heading appears"
The same property holds for ad hoc runs. The agent re-derives the right element from the live state of the page on every action, so a button that has not rendered yet is simply not acted on until it has. This is not a cached or saved selector script: on each run the engine works from a fresh snapshot of what is rendered right then, which is also why a late element is handled naturally rather than treated as a missing one. For more on that live-DOM approach, see how BrowserBash handles dynamic UIs.
Composition does not break the wait
Because tests compose with @import, you can factor the slow login into a shared file and reuse it everywhere without losing the auto-wait behavior:
# Checkout as a returning user
@import ./login_test.md
1. Click the "Cart" link
2. Click "Proceed to checkout"
3. Confirm the "Order summary" region is visible
4. Confirm the total is greater than 0
Each imported step and each local step gets the same actionability checks. The "Order summary" region in step 3 might be rendered by a component that mounts after a price-calculation call returns; auto-wait covers that gap the same way it covered the dashboard panel above. You did not have to think about it.
Auto-wait versus the two engines
BrowserBash ships two engines, and both rely on the same Playwright auto-wait for late elements, just wrapped differently. The default engine, stagehand (MIT, by Browserbase), observes the live DOM each step and decides the next action from what is rendered right then, so a not-yet-rendered element is something it simply has not seen yet, not a failure. The alternative builtin engine (an Anthropic tool-use loop) captures native Playwright traces and re-derives the selector on every action from a fresh snapshot, never cached across runs. In both cases the act of clicking or asserting goes through Playwright's actionability checks with the 15-second ceiling, so the late-element behavior is consistent no matter which engine you pick:
# default stagehand engine
browserbash run "confirm the search results list appears after typing 'invoices'"
# builtin engine, useful when you want native Playwright traces
browserbash run "confirm the search results list appears after typing 'invoices'" --engine builtin
The choice between them is about how you want the agent to reason and what artifacts you want, not about whether late elements are handled. They both wait properly.
Auto-wait is not a substitute for asserting
A subtle point: auto-wait makes an action wait for its target, but it does not, by itself, prove the right thing happened. If you only ever click, you are leaning on the wait to mean "the page progressed," which is implicit and fragile. The robust pattern is to follow a slow transition with an explicit confirmation step, because that turns the wait into a checked condition.
1. Click "Generate report"
2. Confirm the text "Report ready" is visible
3. Confirm a "Download" button is visible
Step 1 waits for the button to be clickable. Steps 2 and 3 wait for the actual outcome of the slow operation and assert it. Now the test fails loudly if the report never generates, instead of silently passing because the click "worked." Auto-wait and assertions are partners: the wait gets you to the moment, the assertion checks the moment is correct.
Honest limits: when content exceeds the ceiling
Auto-wait with a 15-second ceiling covers the overwhelming majority of late-rendering UI, but it is a bound, not magic, and you should know exactly where it stops helping.
Genuinely long operations exceed 15 seconds. A report that takes 30 seconds to generate, a video that transcodes for a minute, a heavy export, or a backend job that legitimately runs long will blow past the ceiling. Auto-wait will time out and the step fails with a timeout, which is correct behavior (it is not silently hanging) but it is not the result you wanted. The fix is not a longer blind sleep; it is to wait on a signal the application actually emits. Have the step confirm a progress indicator, a "still working" state, or a final "done" badge, and where the app exposes one, key the assertion to the completion signal rather than a raw duration. Some genuinely long flows are better split so the slow operation is its own focused test. The general headless and timeout mechanics are covered in the headless and timeouts tutorial.
Infinite and lazy content never "finishes." Auto-wait waits for a specific element to be ready. It cannot wait for "all content," because on an infinite-scroll feed or a lazy-loaded image grid there is no all. If your target is below the fold or only mounts on scroll, the element is not late so much as not requested yet, and waiting alone will not summon it. You need a step that triggers the load (scroll to the bottom, click "Load more") and then asserts the specific item. That pattern has its own write-up in test lazy-loaded and infinite-scroll pages.
The element renders but is the wrong one. Auto-wait checks actionability, not identity. If two buttons share an accessible name and the late one is the one you want, the wait is satisfied by whichever matches first. This is a locator-precision problem, not a timing problem, and the answer is to describe the target more specifically (its region, its surrounding text, its state) so the agent resolves to the intended element. Auto-wait will not save a vague instruction.
Polling has a floor, and animations cost real time. Because the engine re-checks rather than firing instantly, there is a small, bounded cost to every wait, and an element that animates for a full second will hold the action for that second because the "stable" check refuses to click a moving target. This is correct (clicking mid-animation is how you get misfires) but it does mean a heavily animated UI runs a touch slower than a static one. That is a worthwhile trade for not clicking the wrong pixel, but it is a real cost.
Small local models add their own latency, separate from waiting. Auto-wait bounds the browser-side wait, not the model-side think time. On long flows, small local models (8B or smaller) are flaky and can stall reasoning between steps in ways the 15-second per-action browser ceiling does not address. For hard multi-step flows, a 70B-class local model (Qwen3, Llama 3.3) or a hosted model is more reliable. Local keeps everything on your machine; the default auto model resolution tries Ollama first, then ANTHROPIC_API_KEY, then OPENROUTER_API_KEY (free models exist). This is a model-capability limit, not an auto-wait limit, but it shows up in the same place: a step that "takes too long."
The honest summary is that auto-wait solves "the element rendered a bit later than the action," which is the common case, and bounds the failure of "the element never rendered." It does not solve "the operation takes longer than any reasonable per-action ceiling" or "the content was never requested." For those you assert on the application's own signals, and you reach for the patterns in learn rather than a bigger sleep.
FAQ
Do I ever need to add a manual sleep in a BrowserBash test?
In normal use, no. Auto-wait holds each action until the element is attached, visible, stable, and enabled, up to 15 seconds, which covers late-rendering UI without a sleep. The one place people reach for a pause is a genuinely long backend operation, and even there the right answer is to assert on a completion signal the app exposes (a "done" badge, a result row) rather than guessing a duration. A fixed sleep is either too short (flaky) or too long (slow), so it loses both ways.
What happens when the element appears after the 15-second ceiling?
The action times out and the step fails. This is deliberate: the ceiling stops a broken or stuck page from hanging the whole run. In CI you can tell this apart from a normal assertion failure because a timeout maps to exit code 3, while a clean failed assertion is exit code 1. If the content legitimately takes longer than 15 seconds, do not extend a blind wait; restructure the test to confirm the app's own progress and completion states, or split the slow operation into its own focused run.
Is BrowserBash self-correcting its selectors when an element loads late?
It does not patch or keep a saved selector script. On every action it re-derives the target from a fresh snapshot of the live page using the accessibility tree (roles, accessible names, states) plus the DOM. A late element is simply one the engine has not seen rendered yet, so once it appears it gets matched normally. There is no cached selector to go stale and nothing being rewritten between runs; each run reads the page as it actually is at that moment.
Does auto-wait work the same in headless CI as it does locally?
Yes. The actionability checks and the 15-second ceiling are the same whether you run with --headless in a pipeline or watch a real window locally. Headless tends to be a touch faster because there is no rendering to a visible surface, but the waiting behavior is identical. For CI you can add --agent to emit NDJSON, --record to capture a webm and screenshots, and rely on the exit codes (0 pass, 1 fail, 2 error, 3 timeout) to gate the build.
Wrapping up
Late elements are a timing problem that fixed sleeps solve badly, because a sleep waits on the clock instead of the page. BrowserBash sidesteps the problem by leaning on Playwright's built-in auto-wait: every click, type, and assertion waits for its specific target to become actionable, taking exactly as long as the element needs, bounded by a 15-second ceiling that keeps a broken page from hanging the run and gives the failure a clean exit code. You write intent in a *_test.md file with zero sleeps, the agent re-derives the right element from the live page on each action, and the wait is handled underneath.
The honest boundary is the ceiling itself. Operations that legitimately run longer than 15 seconds, and content that is never requested (lazy and infinite lists), are not waiting problems and a longer pause will not fix them. For those you assert on the application's own completion signals. Within its lane, which is the common case of "this rendered a little late," auto-wait is the reason a BrowserBash suite stays both fast and stable without a single hand-coded sleep. Install it with npm install -g browserbash-cli and delete your sleeps.