Table of Contents
Fetching ...

Aladdin-FTI @ AMIYA Three Wishes for Arabic NLP: Fidelity, Diglossia, and Multidialectal Generation

Jonathan Mutal, Perla Al Almaoui, Simon Hengchen, Pierrette Bouillon

TL;DR

The paper addresses the under-representation of Arabic dialects by modeling dialectal Arabic as a pluricentric language. It introduces Aladdin-FTI, a model trained with a joint objective that combines machine translation among DA, MSA, and English with instruction-conditioned next-token generation to produce dialectal output. The authors show that translation improves diglossia and semantic adequacy while generation enhances dialectal fidelity; together, MT and generation yield a balanced performance, enabling smaller models to approach or match larger baselines. This work advances practical Arabic NLP by demonstrating a principled, dual-objective approach and releasing code and models for public use.

Abstract

Arabic dialects have long been under-represented in Natural Language Processing (NLP) research due to their non-standardization and high variability, which pose challenges for computational modeling. Recent advances in the field, such as Large Language Models (LLMs), offer promising avenues to address this gap by enabling Arabic to be modeled as a pluricentric language rather than a monolithic system. This paper presents Aladdin-FTI, our submission to the AMIYA shared task. The proposed system is designed to both generate and translate dialectal Arabic (DA). Specifically, the model supports text generation in Moroccan, Egyptian, Palestinian, Syrian, and Saudi dialects, as well as bidirectional translation between these dialects, Modern Standard Arabic (MSA), and English. The code and trained model are publicly available.

Aladdin-FTI @ AMIYA Three Wishes for Arabic NLP: Fidelity, Diglossia, and Multidialectal Generation

TL;DR

The paper addresses the under-representation of Arabic dialects by modeling dialectal Arabic as a pluricentric language. It introduces Aladdin-FTI, a model trained with a joint objective that combines machine translation among DA, MSA, and English with instruction-conditioned next-token generation to produce dialectal output. The authors show that translation improves diglossia and semantic adequacy while generation enhances dialectal fidelity; together, MT and generation yield a balanced performance, enabling smaller models to approach or match larger baselines. This work advances practical Arabic NLP by demonstrating a principled, dual-objective approach and releasing code and models for public use.

Abstract

Arabic dialects have long been under-represented in Natural Language Processing (NLP) research due to their non-standardization and high variability, which pose challenges for computational modeling. Recent advances in the field, such as Large Language Models (LLMs), offer promising avenues to address this gap by enabling Arabic to be modeled as a pluricentric language rather than a monolithic system. This paper presents Aladdin-FTI, our submission to the AMIYA shared task. The proposed system is designed to both generate and translate dialectal Arabic (DA). Specifically, the model supports text generation in Moroccan, Egyptian, Palestinian, Syrian, and Saudi dialects, as well as bidirectional translation between these dialects, Modern Standard Arabic (MSA), and English. The code and trained model are publicly available.
Paper Structure (30 sections, 1 equation, 2 figures, 6 tables)

This paper contains 30 sections, 1 equation, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Trade-off between diglossia-sensitive translation accuracy (ChrF++) and dialectal fidelity (Macro ADI2). Each faded point corresponds to a decoding configuration (learning rate $\times$ checkpoint), while highlighted points indicate the best configuration selected per model. Instruction-based generation (GEN) favours dialectal fidelity at the expense of diglossia, whereas MT exhibits the opposite behaviour. The combined MT+GEN objective achieves the best overall, improving both fidelity and diglossia.
  • Figure 2: Performance for diglossia (ChrF++) and fidelity (Macro ADI2) across training paradigms. Each boxplot corresponds to a training paradigm (Baseline, MT, GEN, MT+GEN) using SmolLM3-3B, and each point represents a distinct decoding configuration (top_p, temperature, learning rate), with scores macro-averaged over language varieties and test sets.