Google DeepMind has introduced Gemini Robotics 1.5, a groundbreaking AI model designed to empower robots with advanced reasoning, perception, and action capabilities in real-world environments. This development marks a significant step towards realizing the potential of general-purpose, intelligent robots.
Introducing Gemini Robotics 1.5
Gemini Robotics 1.5 is DeepMind’s most capable vision-language-action (VLA) model, enabling robots to transform visual information and instructions into precise motor commands. The model emphasizes transparent decision-making by showcasing its reasoning process, allowing robots to assess and complete complex tasks more effectively. Additionally, Gemini Robotics 1.5 demonstrates the ability to learn across various embodiments, facilitating skill acquisition across different robotic platforms.
The model’s capabilities include:
-
Generalization: Gemini Robotics 1.5 adapts to new situations, objects, and environments, effectively handling tasks it has not encountered during training.
-
Interactivity: The model understands and responds to natural language instructions, adjusting its actions based on real-time environmental changes.
-
Dexterity: Gemini Robotics 1.5 performs intricate, multi-step tasks requiring fine motor skills, such as origami folding and precise object manipulation.
-
Embodiment Versatility: Trained primarily on the ALOHA 2 bi-arm platform, Gemini Robotics 1.5 has been successfully adapted to various robotic systems, including Apptronik’s humanoid robot, Apollo, and the Franka bi-arm platform.
Also Read: Symbotic & Nyobolt Launch Power Tech for SymBot™ AMRs
Advancing Embodied Reasoning with Gemini Robotics-ER 1.5
Alongside Gemini Robotics 1.5, DeepMind introduces Gemini Robotics-ER 1.5, an advanced embodied reasoning model that enhances spatial understanding and task planning. This model integrates seamlessly with low-level controllers, enabling robots to perform end-to-end tasks autonomously. Gemini Robotics-ER 1.5 achieves state-of-the-art performance across 15 academic benchmarks, including Embodied Reasoning Question Answering (ERQA) and Point-Bench.
Gemini Robotics-ER 1.5 is now available to developers via the Gemini API in Google AI Studio, providing a robust platform for building intelligent robotic applications.
Commitment to Responsible AI Development
DeepMind remains dedicated to advancing AI and robotics responsibly. The company collaborates with its Responsibility & Safety Council (RSC) and the Responsible Development & Innovation (ReDI) team to ensure that Gemini Robotics models align with AI safety principles. These efforts aim to mitigate risks and promote the safe deployment of AI agents in human-centric environments.