Monday, September 29, 2025

NVIDIA Unveils Granary: An Open Multilingual Speech AI Dataset with High-Performance Canary and Parakeet Models

Related stories

Aisles Launches DREAM: AI-Driven Virtual Reality Evolution

Aisles has unveiled DREAM (Dynamic Reality Experience and Memory),...

TechSee Unveils Visual Remote Assistance with AI (VRAi) on Salesforce

TechSee, a global leader in visual customer assistance, announced...

Rendever and Lenovo Collaborate to Bring Virtual Reality Experiences to Carolina Caring Seniors

Rendever, the Boston-based company pioneering the future of aging...

Ansys 2024 R1 Reimagines the User Experience while Expanding Multiphysics Superiority Boosted by AI

The latest release from Ansys, 2024 R1, introduces an elevated user...

eXeX and Neurosurgeon Dr. Robert Masson Achieve World First Using Apple Vision Pro

eXeX™, a leader in artificial intelligence and mixed reality...
spot_imgspot_img

NVIDIA announced the launch of Granary, a massive open-source dataset comprising approximately 1 million hours of multilingual audio, along with two high-performance speech AI models, Canary-1b-v2 and Parakeet-tdt-0.6b-v3. These resources are designed to advance speech recognition and translation technologies across 25 European languages, including underrepresented ones such as Croatian, Estonian and Maltese.

Granary delivers around 650,000 hours of audio for automatic speech recognition (ASR) and over 350,000 hours for automatic speech translation (AST), supporting developers in building scalable, high-quality multilingual speech applications.

In collaboration with Carnegie Mellon University and Fondazione Bruno Kessler, the NVIDIA speech AI team developed an innovative processing pipeline using the NeMo Speech Data Processor toolkit. This pipeline converts unlabeled audio into structured, high-quality datasets without extensive manual annotation. The dataset spans the 24 official European Union languages, plus Russian and Ukrainian, enabling more inclusive speech AI development.

Also Read: Ironclad Forms Strategic Partnership with Harvey

The models built on Granary are now available on Hugging Face and will be presented at Interspeech in the Netherlands (August 17–21).

Granary Highlights:

  • Supports fast development of production-scale applications such as multilingual chatbots, customer service voice agents, and near-real-time translation services.

  • Provides critical resources for languages with limited existing datasets.

  • Enables developers to reach target levels of ASR and AST accuracy using roughly half the training data compared to other datasets.

Model Spotlight:

  • NVIDIA Canary-1b-v2, a 1-billion-parameter model, provides high-quality transcription across European languages and translation between English and the other 24 supported languages. It leads the Hugging Face leaderboard for multilingual speech recognition accuracy.

  • NVIDIA Parakeet-tdt-0.6b-v3, a streamlined 600-million-parameter model, is optimized for real-time and large-volume transcription workloads, achieving the highest throughput among multilingual models on Hugging Face.

Both models offer features such as automatic punctuation, capitalization, and word-level timestamps. Canary-1b-v2 delivers performance comparable to models three times its size while delivering inference up to ten times faster.

By sharing Granary and these models including the underlying methodology NVIDIA empowers developers worldwide to adapt this workflow to build or enhance other ASR or AST models and support additional languages.

Subscribe

- Never miss a story with notifications


    Latest stories

    spot_img