Table of Contents
Fetching ...

dzNLP at NADI 2024 Shared Task: Multi-Classifier Ensemble with Weighted Voting and TF-IDF Features

Mohamed Lichouri, Khaled Lounnas, Boualem Nadjib Zahaf, Mehdi Ayoub Rabiai

TL;DR

The paper tackles country-level Arabic dialect identification under the NADI 2024 MLDID task using a simple, TF-IDF–based feature representation with three experiments: (1) a union of word, character, and character-with-boundaries n-grams; (2) weighted fusion of these TF-IDF features with a LinearSVC classifier; and (3) a weighted hard voting ensemble across LSVC, RF, and KNN. Despite its simplicity, the approach achieves competitive precision, with a top 63.22% precision among participants, but exhibits a low overall F1 (~21%) and recall (~12.87%), highlighting recall deficiencies in handling diverse dialect labels. The study emphasizes the effectiveness of character-level features, balanced class weights, and weighted feature fusion, while showing that ensemble methods did not outperform well-tuned single classifiers. The findings offer practical guidance for improving Arabic dialect identification systems in resource-constrained settings and across multilingual data collections.

Abstract

This paper presents the contribution of our dzNLP team to the NADI 2024 shared task, specifically in Subtask 1 - Multi-label Country-level Dialect Identification (MLDID) (Closed Track). We explored various configurations to address the challenge: in Experiment 1, we utilized a union of n-gram analyzers (word, character, character with word boundaries) with different n-gram values; in Experiment 2, we combined a weighted union of Term Frequency-Inverse Document Frequency (TF-IDF) features with various weights; and in Experiment 3, we implemented a weighted major voting scheme using three classifiers: Linear Support Vector Classifier (LSVC), Random Forest (RF), and K-Nearest Neighbors (KNN). Our approach, despite its simplicity and reliance on traditional machine learning techniques, demonstrated competitive performance in terms of F1-score and precision. Notably, we achieved the highest precision score of 63.22% among the participating teams. However, our overall F1 score was approximately 21%, significantly impacted by a low recall rate of 12.87%. This indicates that while our models were highly precise, they struggled to recall a broad range of dialect labels, highlighting a critical area for improvement in handling diverse dialectal variations.

dzNLP at NADI 2024 Shared Task: Multi-Classifier Ensemble with Weighted Voting and TF-IDF Features

TL;DR

The paper tackles country-level Arabic dialect identification under the NADI 2024 MLDID task using a simple, TF-IDF–based feature representation with three experiments: (1) a union of word, character, and character-with-boundaries n-grams; (2) weighted fusion of these TF-IDF features with a LinearSVC classifier; and (3) a weighted hard voting ensemble across LSVC, RF, and KNN. Despite its simplicity, the approach achieves competitive precision, with a top 63.22% precision among participants, but exhibits a low overall F1 (~21%) and recall (~12.87%), highlighting recall deficiencies in handling diverse dialect labels. The study emphasizes the effectiveness of character-level features, balanced class weights, and weighted feature fusion, while showing that ensemble methods did not outperform well-tuned single classifiers. The findings offer practical guidance for improving Arabic dialect identification systems in resource-constrained settings and across multilingual data collections.

Abstract

This paper presents the contribution of our dzNLP team to the NADI 2024 shared task, specifically in Subtask 1 - Multi-label Country-level Dialect Identification (MLDID) (Closed Track). We explored various configurations to address the challenge: in Experiment 1, we utilized a union of n-gram analyzers (word, character, character with word boundaries) with different n-gram values; in Experiment 2, we combined a weighted union of Term Frequency-Inverse Document Frequency (TF-IDF) features with various weights; and in Experiment 3, we implemented a weighted major voting scheme using three classifiers: Linear Support Vector Classifier (LSVC), Random Forest (RF), and K-Nearest Neighbors (KNN). Our approach, despite its simplicity and reliance on traditional machine learning techniques, demonstrated competitive performance in terms of F1-score and precision. Notably, we achieved the highest precision score of 63.22% among the participating teams. However, our overall F1 score was approximately 21%, significantly impacted by a low recall rate of 12.87%. This indicates that while our models were highly precise, they struggled to recall a broad range of dialect labels, highlighting a critical area for improvement in handling diverse dialectal variations.
Paper Structure (5 sections, 1 figure, 2 tables)

This paper contains 5 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Proposed system for Arabic Dialect Identification.