Thursday, January 23, 2025

Google’s new RT-2 AI model helps robots interpret visual and language patterns

Related stories

Postman Unveils AI Agent Builder: A Complete Solution

AI Agent Builder offers one-stop solution combining LLMs, APIs,...

FriendliAI and Hugging Face Announce Strategic Partnership

Developers will be able to utilize FriendliAI's accelerated generative...

Appdome Unveils Threat Dynamics™

Threat Dynamics Shows How Threats Move and Provides a...

Savant Unveils New Release of AI Analytics Automation Platform

Savant Labs, the AI-powered automation platform for data analysts,...

Dataiku Optimizer for Snowflake: Monitor Consumption & Integrates

Dataiku, the Universal AI Platform, announced the launch of...
spot_imgspot_img

Google is taking a significant leap in enhancing the intelligence of its robots with the introduction of the Robotic Transformer (RT-2), an advanced AI learning model. Building upon its earlier vision-language-action (VLA) model, RT-2 equips robots with the ability to recognise visual and language patterns more effectively. This enables them to interpret instructions accurately and deduce the most suitable objects to fulfill specific requests.

In recent experiments, researchers put RT-2 to the test with a robotic arm in a simulated kitchen office scenario. The robot was instructed to identify an improvised hammer (which turned out to be a rock) and choose a drink to offer an exhausted person (where it chose a Red Bull). Additionally, the researchers instructed the robot to move a Coke can to a picture of Taylor Swift, revealing the robot’s surprising preference for the famous singer.

Google is taking a significant leap in enhancing the intelligence of its robots with the introduction of the Robotic Transformer (RT-2), an advanced AI learning model. Building upon its earlier vision-language-action (VLA) model, RT-2 equips robots with the ability to recognise visual and language patterns more effectively. This enables them to interpret instructions accurately and deduce the most suitable objects to fulfill specific requests.

In recent experiments, researchers put RT-2 to the test with a robotic arm in a simulated kitchen office scenario. The robot was instructed to identify an improvised hammer (which turned out to be a rock) and choose a drink to offer an exhausted person (where it chose a Red Bull). Additionally, the researchers instructed the robot to move a Coke can to a picture of Taylor Swift, revealing the robot’s surprising preference for the famous singer.

Also Read: Squirrly Company Releases First Batch of AI Digital Assistants for Marketing Revolutionizing Business Success

Prior to the advent of VLA models like RT-2, teaching robots required arduous and time-consuming individual programming for each specific task. The power of these advanced models, however, enables robots to draw from a vast pool of information, allowing them to make informed inferences and decisions on the fly.

Google’s endeavour to create more intelligent robots started the previous year with the announcement of integrating its language model LLM PaLM into robotics, culminating in the development of the PaLM-SayCan system. This system aimed to unite LLM with physical robotics and laid the foundation for Google’s current achievements.

Nevertheless, the new robot is not without its imperfections. During a live demonstration observed by The New York Times, the robot struggled with correctly identifying soda flavours and occasionally misidentified fruit as the colour white. Such glitches highlight the ongoing challenges in refining AI technology for real-world applications.

SOURCE: BusinessToday

Subscribe

- Never miss a story with notifications


    Latest stories

    spot_img