Ara-Best-RQ: Multi Dialectal Arabic SSL

Haroun Elleuch; Ryan Whetten; Salima Mdhaffar; Yannick Estève; Fethi Bougares

Ara-Best-RQ: Multi Dialectal Arabic SSL

Haroun Elleuch, Ryan Whetten, Salima Mdhaffar, Yannick Estève, Fethi Bougares

Abstract

We present Ara-BEST-RQ, a family of self-supervised learning (SSL) models specifically designed for multi-dialectal Arabic speech processing. Leveraging 5,640 hours of crawled Creative Commons speech and combining it with publicly available datasets, we pre-train conformer-based BEST-RQ models up to 600M parameters. Our models are evaluated on dialect identification (DID) and automatic speech recognition (ASR) tasks, achieving state-of-the-art performance on the former while using fewer parameters than competing models. We demonstrate that family-targeted pre-training on Arabic dialects significantly improves downstream performance compared to multilingual or monolingual models trained on non-Arabic data. All models, code, and pre-processed datasets will be publicly released to support reproducibility and further research in Arabic speech technologies.

Ara-Best-RQ: Multi Dialectal Arabic SSL

Abstract

Paper Structure (13 sections, 1 figure, 5 tables)

This paper contains 13 sections, 1 figure, 5 tables.

Introduction
Related Work
Dataset
Crawled Dataset
Combined Dataset
Experiments & Results
Ara-BEST-RQ pre-training
Downstream Fine-tuning
Automatic Speech Recognition
Dialect Identification
Limitations
Conclusion
Acknowledgements

Figures (1)

Figure 1: Distribution of the full training set (in hours) by dialect.

Ara-Best-RQ: Multi Dialectal Arabic SSL

Abstract

Ara-Best-RQ: Multi Dialectal Arabic SSL

Authors

Abstract

Table of Contents

Figures (1)