Table of Contents
Fetching ...

ELYADATA & LIA at NADI 2025: ASR and ADI Subtasks

Haroun Elleuch, Youssef Saidi, Salima Mdhaffar, Yannick Estève, Fethi Bougares

TL;DR

This work tackles robust Arabic dialect processing under NADI 2025, addressing ADI and multi-dialectal ASR. It employs large pre-trained models with targeted fine-tuning: a two-stage ADI pipeline using Whisper-large-v3 and per-dialect fine-tuning of SeamlessM4T-v2 Large for ASR. The ADI contribution achieves top performance with an accuracy of $79.83\%$, while the ASR track achieves second place with an average $WER=38.54\%$ and $CER=14.53\%$ on the test data, validating the efficacy of dialect-specific adaptation. The findings demonstrate that combining large pre-trained speech models with careful, dialect-focused fine-tuning yields strong, practical gains for Arabic speech processing and supports scalable deployment across diverse dialects.

Abstract

This paper describes Elyadata \& LIA's joint submission to the NADI multi-dialectal Arabic Speech Processing 2025. We participated in the Spoken Arabic Dialect Identification (ADI) and multi-dialectal Arabic ASR subtasks. Our submission ranked first for the ADI subtask and second for the multi-dialectal Arabic ASR subtask among all participants. Our ADI system is a fine-tuned Whisper-large-v3 encoder with data augmentation. This system obtained the highest ADI accuracy score of \textbf{79.83\%} on the official test set. For multi-dialectal Arabic ASR, we fine-tuned SeamlessM4T-v2 Large (Egyptian variant) separately for each of the eight considered dialects. Overall, we obtained an average WER and CER of \textbf{38.54\%} and \textbf{14.53\%}, respectively, on the test set. Our results demonstrate the effectiveness of large pre-trained speech models with targeted fine-tuning for Arabic speech processing.

ELYADATA & LIA at NADI 2025: ASR and ADI Subtasks

TL;DR

This work tackles robust Arabic dialect processing under NADI 2025, addressing ADI and multi-dialectal ASR. It employs large pre-trained models with targeted fine-tuning: a two-stage ADI pipeline using Whisper-large-v3 and per-dialect fine-tuning of SeamlessM4T-v2 Large for ASR. The ADI contribution achieves top performance with an accuracy of , while the ASR track achieves second place with an average and on the test data, validating the efficacy of dialect-specific adaptation. The findings demonstrate that combining large pre-trained speech models with careful, dialect-focused fine-tuning yields strong, practical gains for Arabic speech processing and supports scalable deployment across diverse dialects.

Abstract

This paper describes Elyadata \& LIA's joint submission to the NADI multi-dialectal Arabic Speech Processing 2025. We participated in the Spoken Arabic Dialect Identification (ADI) and multi-dialectal Arabic ASR subtasks. Our submission ranked first for the ADI subtask and second for the multi-dialectal Arabic ASR subtask among all participants. Our ADI system is a fine-tuned Whisper-large-v3 encoder with data augmentation. This system obtained the highest ADI accuracy score of \textbf{79.83\%} on the official test set. For multi-dialectal Arabic ASR, we fine-tuned SeamlessM4T-v2 Large (Egyptian variant) separately for each of the eight considered dialects. Overall, we obtained an average WER and CER of \textbf{38.54\%} and \textbf{14.53\%}, respectively, on the test set. Our results demonstrate the effectiveness of large pre-trained speech models with targeted fine-tuning for Arabic speech processing.

Paper Structure

This paper contains 16 sections, 1 figure, 5 tables.

Figures (1)

  • Figure 1: Confusion matrix on the provided development set.