Gemini 2.5 'Computer Use' AI Model Can Automously Navigate Websites
JAKARTA - Google Gemini 2.5 Computer Use model is a new AI agent that can explore the web and interact with user interfaces (UI) independently. This AI can click, type, and scroll the page by text command. Built on Gemini 2.5 Pro, this technology is now available for developers and marks a major step towards a general purpose AI that can complete human-like digital tasks.
Google is rolling out an ambitious new AI model, designed to interact with the internet in a highly human-like way. Called Gemini 2.5 Computer Use, this particular AI can navigate web browsers, click buttons, fill out forms, and even scroll the page all by text orders that are simple.
This is a significant step towards creating an AI agent that can perform complex digital tasks autonomously. This model is able to surpass simple chatbot responses to actively engage with user interfaces.
SEE ALSO:
The Core of Gemini 2.5 Computer Use
Built on Gemini 2.5 Pro capabilities, this AI model distinguishes itself from operating in a virtual browser environment. Unlike some competing AI agents that can access the entire desktop operating system, this Google model specifically focuses on web and mobile interfaces.
This approach allows it to handle daily digital jobs that previously required complicated human intervention or API integration. Imagine an AI filling out detailed online forms, navigating busy websites, or adding items to shopping carts based on a list 'all without a lot of complexity.
The core of Gemini 2.5 Computer Use is located in an iterative feedback loop. When a user assigns a task to AI, the model first receives a request, a screenshot of the current screen, and a previous action history.
Then, he processes this information and proposes specific UI actions, such as clicking on links, typing text into columns, or scrolling down. The code on the client's side executes the action, the screen is updated, and a new screenshot is sent back to AI. This Loop continues until the initial task is complete.
Google has optimized this model especially for web browsers, however, it also promises mobile app controls. Internal testing on Google already uses this model version for tasks such as UI testing, which accelerates software development.
Focus On Performance And Security
Google claims the Gemini 2.5 Computer Use model "exceeds leading alternatives to various web and mobile benchmarks" with lower latency. Demonstrations show AI is competently addressing tasks such as playing the 2048 game or exploring websites. Interestingly, short tests even demonstrate its ability to break the CAPTCHA of Google Search, a significant hurdle for non-human users.
However, Google also emphasizes safety. The company is aware of the unique risks associated with AI agents controlling computers. Bad actors can potentially abuse, or even unexpected behavior of AI can occur. With this in mind, the company has built a security feature directly into the model. Developers also receive tools to prevent AI from taking high-risk actions, such as compromising system security or bypassing CAPTCHA without explicit permission from users.
Currently, Gemini 2.5 Computer Use is available to developers via Gemini API at Google AI Studio and Vertex AI. It cannot be accessed directly by consumers. However, this technology paved the way for the future where AI handles more of our routine digital interactions.