Gemini CLI: Experimental Browser Agent
Gemini CLI v0.31.0 ships an experimental browser agent that gives the terminal AI direct control over a Chrome browser, enabling web navigation, form filling, button clicking, and content extraction through natural language instructions. The agent uses a hybrid architecture combining a Semantic Agent (powered by Chrome's accessibility tree for efficient, precise interactions) with a Visual Agent (backed by Gemini's Computer Use model for visually-dependent tasks like reading color-coded elements or spatial layouts). Sensitive browser actions — including form submissions and file uploads — require explicit user confirmation through the existing policy engine. The feature is disabled by default and must be enabled in settings.json.
Sources & Mentions
5 external resources covering this update
Feature Proposal: Browser Control for Gemini CLI (Issue #15956)
GitHub
Release v0.31.0
GitHub
Gemini CLI by Google - Release Notes - February 2026 Latest Updates
Releasebot
Creating an Automated UI Test of Your Web App in Seconds, with Gemini CLI and BrowserMCP
Medium
How I Used Gemini CLI to Scrape Data and Build a Startup Directory Website
Dev.to
Browser Agent Arrives in v0.31.0
Gemini CLI v0.31.0 introduces an experimental browser agent that gives the terminal AI direct control over a Chrome browser. Through natural language instructions, the agent can navigate web pages, fill out forms, click buttons, and extract content — all from the command line.
Hybrid Architecture
The browser agent uses a hybrid architecture with two specialized sub-agents that work together:
Semantic Agent
The Semantic Agent is the primary workhorse. It communicates with Chrome through the Chrome DevTools Protocol (CDP) and relies on the browser's accessibility tree to understand page structure. This approach is efficient and precise — the agent can identify interactive elements, read text content, and perform actions without needing to "see" the page visually. It handles the majority of browsing tasks with low latency and high reliability.
Visual Agent
When the Semantic Agent encounters tasks that require visual understanding — such as reading color-coded elements, interpreting spatial layouts, or interacting with canvas-based UIs — it hands off to the Visual Agent. Powered by Gemini's Computer Use model, the Visual Agent takes screenshots of the browser viewport and uses pointer-based actions (clicks, drags, scrolls) to interact with the page. The handoff between agents is automatic: the Semantic Agent recognizes when a task exceeds its capabilities and delegates to the Visual Agent seamlessly.
Browser Session Modes
The agent supports two session modes:
- Isolated mode: Launches a dedicated Chrome instance with a clean profile. This is the default and recommended mode for most tasks, ensuring a predictable environment without interference from existing browser state.
- Existing mode: Attaches to a running Chrome instance (requires Chrome M144 or later). This mode is useful when you need to interact with pages that require existing authentication cookies or session state.
Security Model
The browser agent includes several layers of security:
- URL allowlist: Only domains explicitly listed in the configuration can be visited. All other navigation attempts are blocked.
- Blocked URL patterns: Several URL schemes are permanently blocked regardless of allowlist settings:
file://,javascript:,data:text/html,chrome://extensions, andchrome://settings/passwords. These restrictions prevent local file access, script injection, and modification of browser security settings. - Action confirmation: Sensitive browser actions — including form submissions and file uploads — require explicit user confirmation through the existing policy engine. This ensures no data is submitted or files are uploaded without the user's knowledge.
Disabled by Default
The browser agent is marked as experimental and is disabled by default. To enable it, users must explicitly opt in through their settings.json configuration file. This cautious rollout approach gives the team time to gather feedback and harden the feature before wider adoption.
Practical Applications
The browser agent opens up several practical use cases directly from the terminal:
- End-to-end testing: Describe test scenarios in natural language and let the agent execute them against a running web application, verifying UI behavior without writing test scripts.
- Web scraping: Extract structured data from websites by instructing the agent to navigate pages and collect information.
- Deployment verification: After deploying a web application, use the agent to verify that pages load correctly, forms work, and key user flows are functional.
- Web-based workflows: Automate repetitive web-based tasks such as filling out forms, downloading reports, or checking dashboards.