Table of Contents
Fetching ...

Reverb: Open-Source ASR and Diarization from Rev

Nishchal Bhandari, Danny Chen, Miguel Ángel del Río Fernández, Natalie Delworth, Jennifer Drexler Fox, Migüel Jetté, Quinten McNamara, Corey Miller, Ondřej Novotný, Ján Profant, Nan Qin, Martin Ratajczak, Jean-Philippe Robichaud

TL;DR

Reverb delivers open-source, non-commercial ASR and diarization designed for long-form speech, built on a large, human-transcribed English corpus. The ASR component uses a WeNet-based, conformer-CTC/attention architecture with verbatimicity control and a production WFST-based decoder, plus a Turbo variant for speed/memory efficiency. Diarization employs pyannote.audio with a Rev-finetuned v1 and a WavLM-based v2 to improve speaker attribution, evaluated via Word Diarization Error Rate on multiple long-form benchmarks. Together, the releases provide a practical, configurable foundation for researchers and developers to benchmark, tune verbatimity, and deploy end-to-end diarized transcripts in real-world settings.

Abstract

Today, we are open-sourcing our core speech recognition and diarization models for non-commercial use. We are releasing both a full production pipeline for developers as well as pared-down research models for experimentation. Rev hopes that these releases will spur research and innovation in the fast-moving domain of voice technology. The speech recognition models released today outperform all existing open source speech recognition models across a variety of long-form speech recognition domains.

Reverb: Open-Source ASR and Diarization from Rev

TL;DR

Reverb delivers open-source, non-commercial ASR and diarization designed for long-form speech, built on a large, human-transcribed English corpus. The ASR component uses a WeNet-based, conformer-CTC/attention architecture with verbatimicity control and a production WFST-based decoder, plus a Turbo variant for speed/memory efficiency. Diarization employs pyannote.audio with a Rev-finetuned v1 and a WavLM-based v2 to improve speaker attribution, evaluated via Word Diarization Error Rate on multiple long-form benchmarks. Together, the releases provide a practical, configurable foundation for researchers and developers to benchmark, tune verbatimity, and deploy end-to-end diarized transcripts in real-world settings.

Abstract

Today, we are open-sourcing our core speech recognition and diarization models for non-commercial use. We are releasing both a full production pipeline for developers as well as pared-down research models for experimentation. Rev hopes that these releases will spur research and innovation in the fast-moving domain of voice technology. The speech recognition models released today outperform all existing open source speech recognition models across a variety of long-form speech recognition domains.
Paper Structure (11 sections, 6 tables)