Follow ZDNET: Add us as a preferred source on Google.
ZDNET’s key takeaways
- Google’s new AI model can interact directly with website UIs.
- It joins similar tools from OpenAI and Anthropic.
- The company also admitted its weaknesses, including hallucinations.
Google DeepMind has debuted a new AI model in public preview that’s designed to navigate a web browser just as a human would.
Built atop Gemini 2.5 Pro, the company’s new Computer Use model can execute tasks like clicking, typing, and scrolling directly within a web page.
Also: 5 reasons I use local AI on my desktop – instead of ChatGPT, Gemini, or Claude
Users simply have to feed it a prompt in natural language — such as, “Open Wikipedia, search for ‘Atlantis,’ and summarize the history of the myth in Western thought.” The model will autonomously fetch the URL and screenshots of the requested site to analyze the user interface it needs to act within, and will perform the requested task step by step, all while outlining its reasoning and actions in a text box easily visible to users. It may also respond by asking for confirmation if it’s instructed to perform a sensitive task, like making a purchase.
The preview of Gemini 2.5 Computer Use follows the release of similar web-browsing models from OpenAI and Anthropic. Google previously debuted an experimental Chrome extension called Project Mariner, which can also take action on behalf of users within web pages.
How it works
Gemini 2.5 Computer Use runs off an iterative looping function that allows it to keep a record of all of its recent actions within a particular user interface and determine its next action accordingly. So the more tasks that it performs within a particular site, the more context it will have, and the more seamlessly it will function.
Google posted demo videos (sped up 3x) showing the model autonomously making an update in a customer relationship management site and rearranging notes on Google’s Jamboard platform, which was discontinued at the end of last year.
Also: ChatGPT’s Codex just got a huge upgrade that makes it more powerful than ever – what’s new
According to a blog post published by Google on Tuesday, the new model outperformed similar tools from Anthropic and OpenAI in terms of both accuracy and latency, and across “multiple web and mobile control benchmarks,” including Online-Mind2Web, an evaluation framework for testing the performance of web-browsing agents.
How to try it
The new model is intended mainly for web browsers, but also shows “strong promise” on mobile, Google said. It’s available now through the Gemini API in Google AI and through Vertex AI. A demo version is also available via Browserbase.
Safety considerations
The new model also comes with a set of safety controls, which Google says developers can use to prevent it from performing undesired actions like bypassing CAPTCHAs, compromising data security, or gaining control of medical devices. For example, developers can instruct the model to request user confirmation before it performs certain specified actions.
Want more stories about AI? Sign up for our AI Leaderboard newsletter.
The company also noted in the system card for the new model that it “may exhibit some of the general limitations of foundation models, as it is based off of Gemini 2.5 Pro, such as hallucinations, and limitations around causal understanding, complex logical deduction, and counterfactual reasoning.”
Those limitations are true of most models. Earlier this week, Anthropic published new research showing that many frontier AI models tended to whistleblow what they interpreted as unethical or illegal information in test scenarios, even when the supposedly incriminating information was actually harmless.