The Allen Institute for AI (Ai2) has launched Molmo 2, a next-generation open multimodal model family that delivers advanced video and multi-image understanding with precise spatial and temporal reasoning, marking a significant step forward in open AI systems capable of deep comprehension across visual data modalities. Molmo 2 builds on the globally impactful original Molmo, which pioneered image pointing for multimodal AI systems, and introduces breakthrough capabilities such as video pointing, multi-frame reasoning, and robust object tracking, enabling it to “understand what is happening, where it is happening, and what it means” in both short clips and multi-image sets while using far less training data than many proprietary alternatives. The suite includes multiple variants, such as the 8B-parameter model that surpasses last year’s 72B-parameter Molmo in key tasks and even outperforms some proprietary models like Gemini 3 on video tracking, while smaller versions like the 4B variant excel in multi-image reasoning against other open models despite a more compact architecture. Molmo 2 delivers frame-level spatial and temporal grounding, multi-object tracking, dense long-form captioning, anomaly detection and detailed video QA, all with full transparency; Ai2 is releasing all model weights, training datasets, evaluation tools and data recipes to foster open research and real-world applications in robotics, scientific research, industrial automation and assistive technologies, and publishes nine new open datasets used in training that include millions of multimodal examples for comprehensive benchmarking and development.
Also Read: TwelveLabs Unveils Marengo 3.0 – A Breakthrough in Enterprise-Grade Video Understanding
“With a fraction of the data, Molmo 2 surpasses many frontier models on key video understanding tasks. We are excited to see the immense impact this model will have on the AI landscape, adding another piece to our fully open model ecosystem.” Ali Farhadi
Building on Ai2’s commitment to open science, Molmo 2 aims to democratize access to state-of-the-art video intelligence, offering a fully transparent platform that researchers and developers can inspect, adapt, and extend without the limitations of closed-source systems, thereby advancing the broader AI community’s ability to innovate with high-performance multimodal AI.


