Site icon AIT365

NVIDIA Unveils Granary: An Open Multilingual Speech AI Dataset with High-Performance Canary and Parakeet Models

NVIDIA

NVIDIA announced the launch of Granary, a massive open-source dataset comprising approximately 1 million hours of multilingual audio, along with two high-performance speech AI models, Canary-1b-v2 and Parakeet-tdt-0.6b-v3. These resources are designed to advance speech recognition and translation technologies across 25 European languages, including underrepresented ones such as Croatian, Estonian and Maltese.

Granary delivers around 650,000 hours of audio for automatic speech recognition (ASR) and over 350,000 hours for automatic speech translation (AST), supporting developers in building scalable, high-quality multilingual speech applications.

In collaboration with Carnegie Mellon University and Fondazione Bruno Kessler, the NVIDIA speech AI team developed an innovative processing pipeline using the NeMo Speech Data Processor toolkit. This pipeline converts unlabeled audio into structured, high-quality datasets without extensive manual annotation. The dataset spans the 24 official European Union languages, plus Russian and Ukrainian, enabling more inclusive speech AI development.

Also Read: Ironclad Forms Strategic Partnership with Harvey

The models built on Granary are now available on Hugging Face and will be presented at Interspeech in the Netherlands (August 17–21).

Granary Highlights:

Model Spotlight:

Both models offer features such as automatic punctuation, capitalization, and word-level timestamps. Canary-1b-v2 delivers performance comparable to models three times its size while delivering inference up to ten times faster.

By sharing Granary and these models including the underlying methodology NVIDIA empowers developers worldwide to adapt this workflow to build or enhance other ASR or AST models and support additional languages.

Exit mobile version