ArbESC+: Arabic Enhanced Edit Selection System Combination for Grammatical Error Correction Resolving conflict and improving system combination in Arabic GEC
Ahlam Alrehili, Areej Alhothali
TL;DR
This work tackles Arabic Grammatical Error Correction (GEC) by introducing ArbESC+, a two-stage system that fuses outputs from nine Arabic-focused models and text-editing systems. It casts system combination as edit-level binary classification, employing agreement boosting, dual-threshold filtering, and Non-Maximum Suppression on 1D spans to resolve conflicts and select high-quality edits. Empirical results on QALB-2014/2015 show state-of-the-art performance (e.g., $F_{0.5}$ improvements over strong single models and ensemble baselines), confirming the value of targeted conflict resolution in Arabic GEC. The approach demonstrates stability across datasets and suggests wider applicability to low-resource languages where data scarcity and complex morphology pose challenges.
Abstract
Grammatical Error Correction (GEC) is an important aspect of natural language processing. Arabic has a complicated morphological and syntactic structure, posing a greater challenge than other languages. Even though modern neural models have improved greatly in recent years, the majority of previous attempts used individual models without taking into account the potential benefits of combining different systems. In this paper, we present one of the first multi-system approaches for correcting grammatical errors in Arabic, the Arab Enhanced Edit Selection System Complication (ArbESC+). Several models are used to collect correction proposals, which are represented as numerical features in the framework. A classifier determines and implements the appropriate corrections based on these features. In order to improve output quality, the framework uses support techniques to filter overlapping corrections and estimate decision reliability. A combination of AraT5, ByT5, mT5, AraBART, AraBART+Morph+GEC, and Text editing systems gave better results than a single model alone, with F0.5 at 82.63% on QALB-14 test data, 84.64% on QALB-15 L1 data, and 65.55% on QALB-15 L2 data. As one of the most significant contributions of this work, it's the first Arab attempt to integrate linguistic error correction. Improving existing models provides a practical step towards developing advanced tools that will benefit users and researchers of Arabic text processing.
