Can ChatGPT control a web browser?

Name: BrowserBash
Author: The Testing Academy

Yes, ChatGPT can control a browser now, but the honest answer is more interesting than a yes or no. If you are asking whether ChatGPT can control a browser, you are really asking three different questions at once: can it click and type on real pages, can it finish a multi-step task without you babysitting it, and can you trust it with your logged-in accounts. The capability shipped in stages across 2025 and 2026 — first Operator, then ChatGPT agent mode, then the Atlas browser — and each one moved the line a little. This article walks through what those features actually do, where they fall over, and what to reach for when you need an agent that drives a real Chrome you control end to end.

The short answer, and why it is not a clean yes

ChatGPT can control a browser. It clicks, scrolls, types into forms, navigates between pages, and reads what comes back. You give it an English goal — "find a flight under $400 to Lisbon next month and put it in the cart" — and it carries out the steps, pausing when it hits something sensitive like a login or a payment.

The catch is in the word "control." There are three distinct things ChatGPT does today, and people blur them together:

ChatGPT agent mode (the successor to Operator) runs inside a virtual, sandboxed browser hosted by OpenAI. You watch it work in a window, but the browser is not your machine.
ChatGPT Atlas is a full Chromium browser from OpenAI with ChatGPT built into the sidebar, and an agent mode that acts inside that browser's managed context.
The ChatGPT you type into normally can browse the web (search, open pages, summarize) but does not click and type its way through a task the way agent mode does.

So "can ChatGPT control a browser" is true, but the kind of control, where the browser lives, and how much you are allowed to automate all depend on which surface you mean. Let me take them one at a time.

From Operator to agent mode: a quick timeline

OpenAI's first real browser-driving product was Operator, launched in early 2025 as a research preview for Pro users. Operator ran a model called a Computer-Using Agent that looked at screenshots of a web page and decided where to click. It was impressive and slow, and it did not reach human-level accuracy on messy interfaces or long workflows.

Operator did not stick around as a standalone product. OpenAI folded its capabilities into ChatGPT agent and deprecated Operator; the standalone operator.chatgpt.com experience was shut down on August 31, 2025. If you read a 2025 tutorial that tells you to go to the Operator site, that path is gone — the same capability now lives behind "agent mode" in the ChatGPT composer.

Then in late 2025, OpenAI shipped ChatGPT Atlas, a Chromium-based browser with ChatGPT woven into the UI and an agent mode that operates inside Atlas itself. That is the current state of play as of 2026: agent mode for task automation, and Atlas if you want the agent living inside your everyday browser.

Knowing this history matters because the three surfaces have different trust models, and a lot of advice online still describes Operator's sandbox as if it were the whole story.

What ChatGPT agent mode can actually do

Agent mode is the closest thing to "ChatGPT controlling a browser" in the product most people have. You switch the composer into agent mode, type a goal, and it works through it: searching, opening pages, evaluating what it sees, and acting. OpenAI describes the loop as plan, then execute safe actions (click, type, scroll), then pause when an action could be sensitive.

In practice it is genuinely useful for a real class of tasks:

Research-and-collect. Open a dozen sources, pull the relevant numbers, and assemble them. This is where it shines, because it inherits the same reasoning ChatGPT is already good at.
Shopping and planning flows. Build a meal plan, turn it into a grocery list, and load a cart for delivery. Booking-style flows where the last click is left to you.
Form-driven busywork. Fill out repetitive forms, work with uploaded files, and edit spreadsheets inside the sandbox.

The key architectural fact: agent mode runs in a managed, sandboxed browser, not on your local machine. OpenAI has been explicit that the sandbox has hard boundaries — it does not read your local file system or other apps, there is no code execution in your local browser, no extension installs, no automatic downloads, and it does not use your saved passwords or autofill by default. When the agent reaches a login or a payment, it hands control back to you ("takeover mode") so a human types the credentials.

That sandbox is a deliberate safety design, and it is also the source of most of the friction. Because the agent is not in your browser session, it does not automatically have your cookies, your SSO tokens, or your half-finished cart from this morning. You re-authenticate inside the agent's window, which is the right call for security but means the agent often starts from a cold, logged-out state.

What ChatGPT Atlas adds, and what it changes

Atlas moves the agent into a real Chromium browser that you use day to day. The ChatGPT sidebar can see the page you are on, and agent mode can act inside the browser's context. That solves part of the cold-start problem — the agent is now operating in a browser where you might already be signed in — but it trades that convenience for a bigger blast radius.

That blast radius is the headline concern with Atlas, and it is worth being blunt about it. Security researchers have demonstrated prompt-injection attacks against Atlas: a crafted page, or even a URL-shaped string, can carry hidden instructions that the agent follows instead of your actual goal. Because an in-browser agent inherits your authenticated sessions, a successful injection is not "the agent did the wrong thing on one page" — it is potentially "the agent took an action on every site you are logged into." Independent analyses have also flagged session-inheritance and clipboard-hijacking risks in agent mode.

None of this means Atlas is broken. It means an agent that lives inside your logged-in browser is a fundamentally higher-trust thing than a sandboxed agent in a throwaway session, and you should treat it that way. OpenAI ships sandboxing of agent workflows and an action log so you can see exactly what the agent did, and it pauses on sensitive actions. Use those guardrails. Do not turn an in-browser agent loose on a banking tab and walk away.

The hard limits: where ChatGPT browser control breaks

Across all three surfaces, the same walls show up. If you are evaluating ChatGPT for browser automation, these are the ones that will bite you:

CAPTCHAs and bot detection. The agent will not solve CAPTCHAs or defeat bot detectors, and many sites that gate actions behind a CAPTCHA simply stop the agent cold. This is by design and by policy, not a bug, but it rules out a lot of "automate this annoying site" use cases.
Multi-factor authentication. Complex MFA flows frequently force a human takeover. Anything with an SMS code, an authenticator app, or a hardware key needs you in the loop.
Speed. Agent mode is slow. It hesitates on simple UI decisions and grinds through long workflows. For a one-off task that is fine; for anything you want to run on a schedule or in a pipeline, the latency adds up fast.
No clean machine-readable contract. Agent mode is built for a human watching a chat window, not for a script. There is no first-class "emit one JSON object per step, exit non-zero on failure" interface you can wire into CI without scraping the UI.
Paid plans only. Agent mode and Atlas agent features are gated to paid ChatGPT tiers, and the browser itself is hosted infrastructure, so there is a per-task cost and a dependency on OpenAI's servers.
It is not your browser (agent mode) or it is too much of your browser (Atlas). The sandbox isolates you from your real sessions; the in-browser model exposes all of them. Neither is a precise "use this one Chrome profile, headless, in CI" tool.

These are not reasons to dismiss ChatGPT's browser control. They are reasons to know which job it is wrong for. A reporter doing one-off research? Great fit. A QA engineer who needs the same five flows checked on every deploy, headless, with a pass/fail exit code? Wrong tool.

When you need a real browser the agent fully controls

There is a whole category of work that wants the opposite of a hosted sandbox: you want a real Chrome on your machine, driven by an agent, with a machine-readable result and zero data leaving your laptop. Regression checks, login-flow smoke tests, scraping a value out of an internal dashboard, verifying a deploy in CI — these need determinism, a clean exit code, and control over exactly which browser and which session.

That is the gap BrowserBash fills. It is a free, open-source (Apache-2.0) CLI from The Testing Academy that takes the same plain-English-objective idea ChatGPT agent mode popularized and aims it at a browser you run. You install it with npm, write an objective in English, and an AI agent drives a real Chrome step by step — no selectors, no page objects — then returns a verdict plus the structured values it extracted.

npm install -g browserbash-cli
browserbash run "Go to the staging site, log in with the test account, open Billing, and confirm the plan shows 'Pro'"

The model story is the part that diverges most from ChatGPT. BrowserBash is Ollama-first. The default auto model resolves in order: a local Ollama model if one is running (free, no API keys, nothing leaves your machine), then ANTHROPIC_API_KEY for claude-opus-4-8, then OPENAI_API_KEY for openai/gpt-4.1, otherwise it errors with guidance. So you can run the whole thing on your own hardware with a guaranteed $0 model bill, or point it at a hosted model when a flow is genuinely hard. There is a free local dashboard at browserbash dashboard on localhost, and cloud upload is strictly opt-in — without --upload, nothing leaves your machine.

One honest caveat, because the tutorials say it too: very small local models (8B and under) get flaky on long, multi-step objectives. The sweet spot is a mid-size local model in the Qwen3 / Llama 3.3 70B class, or a capable hosted model for the hard flows. The same reasoning ceiling that makes a tiny model wobble is exactly why ChatGPT's hosted models feel smooth — you are renting a much bigger brain.

ChatGPT browser control vs a CLI agent: a side-by-side

These tools overlap on the surface (both take English, both drive browsers) but they are built for different jobs. Here is the honest comparison.

Dimension	ChatGPT agent mode / Atlas	BrowserBash
Where the browser runs	Hosted sandbox (agent mode) or Atlas browser; not a plain local Chrome you pick	Your real local Chrome by default; CDP, Browserbase, LambdaTest, BrowserStack optional
Interface	Chat window, human in the loop	CLI; `--agent` emits NDJSON, one JSON object per line, with exit codes
Built for CI	No first-class machine contract	Yes — designed for CI and AI coding agents
Model	OpenAI hosted models, paid plans	Ollama-first and free locally; or Claude / GPT / Gemini / OpenRouter
Data privacy	Runs on OpenAI infrastructure	Stays on your machine unless you opt in with `--upload`
Cost	Paid ChatGPT tier + hosted per-task cost	Free and open-source; $0 model bill on local Ollama
Login handling	Human takeover for auth and payments	You script the login; secrets masked as ***** in logs
CAPTCHAs	Will not solve, often blocked	Same honest limit — no CAPTCHA solving
Best at	One-off research, shopping, planning, exploratory tasks	Repeatable checks, regression, scraping, deploy verification in pipelines
Committable tests	Not really	Markdown `*_test.md` files with `{{variables}}` and `@import`

The row that matters most for engineers is the --agent one. BrowserBash emits NDJSON — progress events like {"type":"step","step":1,"status":"passed","action":"navigate"} and a terminal {"type":"run_end","status":"passed|failed|error|timeout",...} — plus exit codes (0 passed, 1 failed, 2 error, 3 timeout). That is the contract a CI job or an AI coding agent needs, and it is the thing a chat-window agent fundamentally does not give you.

browserbash run "Open the demo store, add the first product, start checkout, confirm the subtotal is shown" --agent --record

That run writes a screenshot and a .webm session video (the builtin engine also writes a Playwright trace), and prints NDJSON your pipeline can parse without scraping prose. You can read more about the agent-mode output and engines in the docs.

Two engines, many providers: how BrowserBash drives the browser

Worth a short detour, because it is the piece ChatGPT does not expose. BrowserBash separates who interprets the English (the engine) from where the browser runs (the provider).

The default engine is Stagehand (MIT, by Browserbase), which gives the agent act/extract/observe/agent primitives and self-heals when the page shifts. There is also a builtin engine — an in-repo Anthropic tool-use loop driving Playwright — which is used automatically for LambdaTest and BrowserStack. You switch with --engine stagehand|builtin.

Providers are set with --provider: local (your Chrome, the default), cdp for any DevTools endpoint via --cdp-endpoint ws://..., browserbase, lambdatest, and browserstack. So the same English objective can run against your laptop's Chrome for a quick check, or fan out across a cross-browser grid for release sign-off. ChatGPT's agent is one hosted browser; BrowserBash lets you choose the browser to match the job.

A decision guide: which one fits your task

Be honest about what you are actually doing, and the choice gets easy.

Reach for ChatGPT agent mode or Atlas when:

You have a one-off, exploratory task — research, comparison shopping, trip planning — and you want a smart agent to grind through it while you watch.
You value the reasoning of a large hosted model and do not mind the latency or the paid tier.
The task is naturally human-in-the-loop, with a person available to handle logins, MFA, and the final confirm.
You are not trying to integrate the result into a script or pipeline.

Reach for BrowserBash when:

You need to run the same flow repeatedly — every deploy, every PR, on a schedule.
You want it in CI with a clean pass/fail exit code and NDJSON output, not a chat transcript.
Privacy or cost rules out a hosted model, and you want to run locally on Ollama for $0.
You need control over the exact browser, session, and headless mode.
You want committable, reviewable tests that live next to your code as markdown.

And to be fair to OpenAI: for a knowledge worker who occasionally needs a browser-driving assistant, ChatGPT's agent is the better experience. It is more capable on novel, ambiguous tasks, the UI is friendlier, and you do not manage any infrastructure. If that is you, use it. BrowserBash is not trying to be a better research assistant — it is trying to be the tool you put in a pipeline. Browse the case study to see what that looks like in practice.

Can ChatGPT replace browser automation tools? Not quite

It is tempting to read "ChatGPT can control a browser" as "ChatGPT replaces Playwright, Selenium, and the rest." It does not, and the reasons are concrete.

Traditional automation gives you determinism: the same selectors, the same waits, the same result every run, which is exactly what a regression suite needs. ChatGPT agent mode gives you adaptability: it figures out a page it has never seen, which is exactly what exploratory work needs. These are different virtues, and you usually want both in a mature setup.

BrowserBash sits in an interesting middle. It keeps the English-objective adaptability that makes ChatGPT's agent feel magic — no selectors, no page objects — but wraps it in a CLI with machine-readable output, committable markdown tests, a local run store at ~/.browserbash/runs, and the option to run fully offline. It is the natural-language interface aimed at the engineering workflow rather than the chat window. You can see the full feature list on the features page or read the blog for more walkthroughs.

So the realistic 2026 picture is a toolbox, not a winner. ChatGPT's agent for ad-hoc, human-supervised browser tasks. Deterministic frameworks for the most brittle, high-stakes selectors. And a natural-language CLI like BrowserBash for the large middle — the repeatable, scriptable checks where you want an agent's flexibility but a pipeline's discipline.

FAQ

Can ChatGPT control a web browser?

Yes. Through agent mode (the successor to Operator) ChatGPT can click, type, scroll, and navigate to complete multi-step tasks, and the ChatGPT Atlas browser puts an agent inside a real Chromium browser. The control is real but supervised: it runs in a sandbox or asks for a human takeover on logins and payments, and it will not solve CAPTCHAs or bypass bot detection.

Is ChatGPT Operator still available in 2026?

No. OpenAI deprecated the standalone Operator product and shut down operator.chatgpt.com on August 31, 2025. Its browser-controlling capabilities were folded into ChatGPT agent mode, which you reach by switching the composer into agent mode, and later into the ChatGPT Atlas browser. Older tutorials that point to the Operator site are out of date.

Is it safe to let ChatGPT control my browser?

It depends on which surface and how you use it. Agent mode's hosted sandbox is fairly isolated, but in-browser agents like Atlas inherit your logged-in sessions, and researchers have shown working prompt-injection attacks that can hijack the agent's intent. Keep the action log visible, never let an agent run unattended on banking or admin tabs, and handle sensitive logins yourself.

What is the best way to give an AI agent a real browser for testing?

For repeatable, scriptable browser tasks, a CLI that drives your own Chrome is a better fit than a hosted chat agent. BrowserBash takes a plain-English objective, drives a real local Chrome, and returns a verdict plus NDJSON for CI, and it runs free on a local Ollama model so nothing leaves your machine. It is built for pipelines and coding agents rather than a chat window.

Ready to give an agent a real browser you control? Install with npm install -g browserbash-cli and you are running in seconds — no account needed. When you want the optional cloud dashboard, sign up here (it is free, and entirely optional).