JAKARTA - Google has just introduced Robotics Transformer 2 (RT-2), which is a Vision Language Action (VLA) model for taking action in the real world.
This Artificial Intelligence (AI)-based robot is trained by Google to understand, process text and images from the web. With this data, RT-2 can immediately display the action.
"Unlike chatbots, robots need an creation in the real world and their capabilities. Their training is not just about, like learning everything that needs to be known about apples, how apples grow, and their physical properties," said DeepMind Google Robotics Scientist and Head Vincent Vanhoucke, in the company's official blog., quoted Monday, July 31.
"Robots must be able to recognize apples in context, differentiate them from red balls, understand what they look like, and most importantly, know how to take them," he added.
This latest research has improved the robot's ability to sustain it, even they can use a chain of thought boost, a way to dissect multi-step problems.
The introduction of vision models, such as PaLM-E, helps robots better understand their environment. And RT-1 (before RT-2) shows Transformers, known for their ability to generalize information across systems, can even help different types of robots learn from each other.
In testing the RT-2 model in more than 6,000 robotic trials, scientists found the RT-2 was functioning as well as the previous model on the task in its training data, or a visible task.
"And that almost doubled its performance in the novel, the invisible scenario to 62 percent from 32 percent RT-1. In other words, with RT-2, robots can learn more like us, transferring concepts learned to new situations," explains Vanhoucke.
VOIR éGALEMENT:
Because RT-2 is able to transfer knowledge from a large set of web data, RT-2 already has an idea of what waste is and can identify it without explicit training.
"He even had an idea how to dispose of trash, even though he was never trained to do that," Vanhoucke said.
"And think about the abstract nature of the garbage - what used to be a bag of chips or banana peels into trash after you eat them. RT-2 is able to understand this from its visual language training data and do its work," he continued.
Vanhoucke stated that the RT-2's ability to transfer information to action shows promise for robots to adapt more quickly to new situations and environments.
"RT-2 not only shows how AI's progress flows rapidly into robotics, but also shows enormous promises for more common robots," Vanhoucke said.
"Although there is still much work to be done to activate a robot that helps in human-centered environments, RT-2 shows us an exciting future for robotics in just a hand," he added.
The English, Chinese, Japanese, Arabic, and French versions are automatically generated by the AI. So there may still be inaccuracies in translating, please always see Indonesian as our main language. (system supported by DigitalSiber.id)