NVIDIA Canary-1B-v2 Brings Multilingual Speech to Python Workflows

NVIDIA has released Canary-1B-v2, an open-source automatic speech recognition (ASR) and speech translation model designed for Python-based development workflows. The model, announced June 23, 2026, handles multilingual transcription and translation with support for English, French, German, Spanish, and Italian.

The release targets developers building accessibility tools, localization pipelines, and global content workflows. Canary-1B-v2 accepts 16 kHz mono audio and outputs both transcriptions and word-level timestamps, enabling developers to generate synchronized subtitles automatically.

“We’re making speech AI accessible to every developer,” said NVIDIA’s ML engineering team. “Canary-1B-v2 runs on a single GPU and produces production-quality transcriptions for major languages.”

Key features include automatic SRT export, which generates subtitle files with proper timing markers without additional post-processing. The model also supports batch processing for handling longer audio files, with inference speeds optimized for real-time or near-real-time applications.

The technical architecture uses a compact 1-billion-parameter design that balances accuracy with inference speed. Benchmarks show the model achieves competitive word error rates (WER) on standard speech benchmarks while running efficiently on consumer-grade GPUs.

For developers, Canary-1B-v2 integrates directly into Python workflows via the NVIDIA PyTorch ecosystem. The model supports standard audio preprocessing and outputs structured data that integrates with existing localization and accessibility tooling.

The release addresses growing demand for speech-to-text capabilities in enterprise applications, particularly in sectors requiring multilingual customer support, content localization, and accessibility compliance. Healthcare and legal organizations have shown particular interest in accurate transcription with timestamps for audit trails.

Canary-1B-v2 joins NVIDIA’s expanding speech AI portfolio, which includes earlier releases targeting specific languages and use cases. The model’s open-source availability and Python-native integration lower the barrier for developers adopting speech AI in production systems.