Saturday, May 17, 2025

Google’s new RT-2 AI model helps robots interpret visual and language patterns

Related stories

The Hidden Risks of Data Lakes: What AI Developers Need to Know About Database Exposure

Organizations are racing to use artificial intelligence. Many are...

Proofpoint Announces Agreement to Acquire Hornetsecurity

Strategic acquisition marks significant milestone in advancing Proofpoint’s mission...

Reply Introduces AI Silicon Shoring for Software Delivery

Reply, a leading global systems integrator and technology consulting...

QAD and Boomi Partner to Revolutionize Smart Manufacturing

QAD Inc., a leader in cloud-based manufacturing and connected...
spot_imgspot_img

Google is taking a significant leap in enhancing the intelligence of its robots with the introduction of the Robotic Transformer (RT-2), an advanced AI learning model. Building upon its earlier vision-language-action (VLA) model, RT-2 equips robots with the ability to recognise visual and language patterns more effectively. This enables them to interpret instructions accurately and deduce the most suitable objects to fulfill specific requests.

In recent experiments, researchers put RT-2 to the test with a robotic arm in a simulated kitchen office scenario. The robot was instructed to identify an improvised hammer (which turned out to be a rock) and choose a drink to offer an exhausted person (where it chose a Red Bull). Additionally, the researchers instructed the robot to move a Coke can to a picture of Taylor Swift, revealing the robot’s surprising preference for the famous singer.

Google is taking a significant leap in enhancing the intelligence of its robots with the introduction of the Robotic Transformer (RT-2), an advanced AI learning model. Building upon its earlier vision-language-action (VLA) model, RT-2 equips robots with the ability to recognise visual and language patterns more effectively. This enables them to interpret instructions accurately and deduce the most suitable objects to fulfill specific requests.

In recent experiments, researchers put RT-2 to the test with a robotic arm in a simulated kitchen office scenario. The robot was instructed to identify an improvised hammer (which turned out to be a rock) and choose a drink to offer an exhausted person (where it chose a Red Bull). Additionally, the researchers instructed the robot to move a Coke can to a picture of Taylor Swift, revealing the robot’s surprising preference for the famous singer.

Also Read: Squirrly Company Releases First Batch of AI Digital Assistants for Marketing Revolutionizing Business Success

Prior to the advent of VLA models like RT-2, teaching robots required arduous and time-consuming individual programming for each specific task. The power of these advanced models, however, enables robots to draw from a vast pool of information, allowing them to make informed inferences and decisions on the fly.

Google’s endeavour to create more intelligent robots started the previous year with the announcement of integrating its language model LLM PaLM into robotics, culminating in the development of the PaLM-SayCan system. This system aimed to unite LLM with physical robotics and laid the foundation for Google’s current achievements.

Nevertheless, the new robot is not without its imperfections. During a live demonstration observed by The New York Times, the robot struggled with correctly identifying soda flavours and occasionally misidentified fruit as the colour white. Such glitches highlight the ongoing challenges in refining AI technology for real-world applications.

SOURCE: BusinessToday

Subscribe

- Never miss a story with notifications


    Latest stories

    spot_img