Google Introduces Agentic Vision for Gemini 3 Flash
Google introduced Agentic Vision, a new Artificial Intelligence (AI) agent capability for the Gemini 3 Flash model. This capability allows AI to actively investigate images.
Google explains that this technology combines visual reasoning with code execution to examine small details. Unlike traditional models that often guess at blurry details, Agentic Vision treats vision as an investigative process.
This model will formulate a plan in the form of step-by-step steps to enlarge, cut, and manipulate images. That way, the AI model can find accurate visual evidence.
This AI agent carries out the 'Thinking, Acting, and Observing' method based on an intelligent agent system. First, the AI model will analyze the user's request, run Python code to manipulate the image, then observe the new result before giving an answer.
The use of this code execution has been proven to provide a consistent quality improvement of 5 to 10 percent on various AI vision benchmarks. This ability is crucial when the model must detect a microchip serial number or read a road sign that is located very far away.
PlanCheckSolver.com, an AI-based building plan validation platform, has used this feature. In its adoption, Gemini 3 Flash can cut out construction details and add them to the context window automatically.
With Agentic Vision, Gemini 3 Flash can also interact directly by annotating or drawing bounding boxes on images. This technique, called 'visual whiteboard', can ensure object calculations.
This model is claimed to be very smart in handling visual mathematics and creating graphs from very dense data tables. Gemini will identify raw data, then write Python code to generate professional bar graphs without guessing numbers probabilistically.