Table of Contents
Fetching ...

Open Universal Arabic ASR Leaderboard

Yingzhi Wang, Anas Alhmoud, Muhammad Alqurishi

TL;DR

This work introduces the Open Universal Arabic ASR Leaderboard, a continuous, open-source benchmark that evaluates open-source Arabic ASR models across multiple multi-dialect datasets to assess generalization and robustness. Using zero-shot inference, the study groups models into Whisper, Conformer, self-supervised, and multi-task categories and evaluates them on SADA, Common Voice Arabic, MASC, and MGB-2, reporting WER and CER with a standardized text normalization protocol. Key findings show Conformer-CTC-large-Arabic with an LM leading the leaderboard, while performance generally scales with training data; robustness varies across dialects, with MSA being the strongest and Egyptian/Khaliji more challenging. The work also analyzes efficiency and resource usage, revealing a trade-off where self-supervised models are more efficient under light inference loads, whereas larger Whisper models demand more memory and compute, and discusses biases and limitations, framing a long-term, continuously updated reference for Arabic ASR research.

Abstract

In recent years, the enhanced capabilities of ASR models and the emergence of multi-dialect datasets have increasingly pushed Arabic ASR model development toward an all-dialect-in-one direction. This trend highlights the need for benchmarking studies that evaluate model performance on multiple dialects, providing the community with insights into models' generalization capabilities. In this paper, we introduce Open Universal Arabic ASR Leaderboard, a continuous benchmark project for open-source general Arabic ASR models across various multi-dialect datasets. We also provide a comprehensive analysis of the model's robustness, speaker adaptation, inference efficiency, and memory consumption. This work aims to offer the Arabic ASR community a reference for models' general performance and also establish a common evaluation framework for multi-dialectal Arabic ASR models.

Open Universal Arabic ASR Leaderboard

TL;DR

This work introduces the Open Universal Arabic ASR Leaderboard, a continuous, open-source benchmark that evaluates open-source Arabic ASR models across multiple multi-dialect datasets to assess generalization and robustness. Using zero-shot inference, the study groups models into Whisper, Conformer, self-supervised, and multi-task categories and evaluates them on SADA, Common Voice Arabic, MASC, and MGB-2, reporting WER and CER with a standardized text normalization protocol. Key findings show Conformer-CTC-large-Arabic with an LM leading the leaderboard, while performance generally scales with training data; robustness varies across dialects, with MSA being the strongest and Egyptian/Khaliji more challenging. The work also analyzes efficiency and resource usage, revealing a trade-off where self-supervised models are more efficient under light inference loads, whereas larger Whisper models demand more memory and compute, and discusses biases and limitations, framing a long-term, continuously updated reference for Arabic ASR research.

Abstract

In recent years, the enhanced capabilities of ASR models and the emergence of multi-dialect datasets have increasingly pushed Arabic ASR model development toward an all-dialect-in-one direction. This trend highlights the need for benchmarking studies that evaluate model performance on multiple dialects, providing the community with insights into models' generalization capabilities. In this paper, we introduce Open Universal Arabic ASR Leaderboard, a continuous benchmark project for open-source general Arabic ASR models across various multi-dialect datasets. We also provide a comprehensive analysis of the model's robustness, speaker adaptation, inference efficiency, and memory consumption. This work aims to offer the Arabic ASR community a reference for models' general performance and also establish a common evaluation framework for multi-dialectal Arabic ASR models.

Paper Structure

This paper contains 12 sections, 5 tables.