Table of Contents
Fetching ...

Whispering in Norwegian: Navigating Orthographic and Dialectic Challenges

Per E Kummervold, Javier de la Rosa, Freddy Wetjen, Rolv-Arild Braaten, Per Erik Solberg

TL;DR

This work introduces NB-Whisper, a Norwegian-focused fine-tuning of OpenAI's Whisper designed to handle the country’s dialectal diversity and dual written standards (Bokmål and Nynorsk) and to translate into English. Through a two-phase training regime and dataset cleaning, NB-Whisper achieves substantial reductions in WER compared with OpenAI Whisper Large across Bokmål and Nynorsk on benchmarks such as Fleurs, NST, and Common Voice Nynorsk, with the Large model delivering the strongest gains (e.g., Bokmål NST WER ≈ 2.2%, Nynorsk Common Voice WER ≈ 12.6%). The approach leverages a diverse data mix (NRK subtitles, audio books, NST, parliament speech) and targeted data-cleaning heuristics, underscoring the importance of orthographic variation handling and dialect robustness in Norwegian ASR. The work also discusses limitations in evaluation tools for Norwegian and proposes future directions in dataset standardization and architectural adaptations that could apply to other languages with rich dialectal and orthographic variation.

Abstract

This article introduces NB-Whisper, an adaptation of OpenAI's Whisper, specifically fine-tuned for Norwegian language Automatic Speech Recognition (ASR). We highlight its key contributions and summarise the results achieved in converting spoken Norwegian into written forms and translating other languages into Norwegian. We show that we are able to improve the Norwegian Bokmål transcription by OpenAI Whisper Large-v3 from a WER of 10.4 to 6.6 on the Fleurs Dataset and from 6.8 to 2.2 on the NST dataset.

Whispering in Norwegian: Navigating Orthographic and Dialectic Challenges

TL;DR

This work introduces NB-Whisper, a Norwegian-focused fine-tuning of OpenAI's Whisper designed to handle the country’s dialectal diversity and dual written standards (Bokmål and Nynorsk) and to translate into English. Through a two-phase training regime and dataset cleaning, NB-Whisper achieves substantial reductions in WER compared with OpenAI Whisper Large across Bokmål and Nynorsk on benchmarks such as Fleurs, NST, and Common Voice Nynorsk, with the Large model delivering the strongest gains (e.g., Bokmål NST WER ≈ 2.2%, Nynorsk Common Voice WER ≈ 12.6%). The approach leverages a diverse data mix (NRK subtitles, audio books, NST, parliament speech) and targeted data-cleaning heuristics, underscoring the importance of orthographic variation handling and dialect robustness in Norwegian ASR. The work also discusses limitations in evaluation tools for Norwegian and proposes future directions in dataset standardization and architectural adaptations that could apply to other languages with rich dialectal and orthographic variation.

Abstract

This article introduces NB-Whisper, an adaptation of OpenAI's Whisper, specifically fine-tuned for Norwegian language Automatic Speech Recognition (ASR). We highlight its key contributions and summarise the results achieved in converting spoken Norwegian into written forms and translating other languages into Norwegian. We show that we are able to improve the Norwegian Bokmål transcription by OpenAI Whisper Large-v3 from a WER of 10.4 to 6.6 on the Fleurs Dataset and from 6.8 to 2.2 on the NST dataset.
Paper Structure (10 sections, 7 tables)