TCE at Qur'an QA 2023 Shared Task: Low Resource Enhanced Transformer-based Ensemble Approach for Qur'anic QA
Mohammed Alaa Elkomy, Amany Sarhan
TL;DR
This work tackles Qur’an QA in low-resource Arabic by combining transfer learning with external Arabic resources and ensemble voting across dual-encoder and cross-encoder architectures for Task A, and by fine-tuning LMs for extractive MRC with FAL and MAL learning for Task B, augmented by post-processing. It introduces faithful splits to address leakage and leverages external resources (tafseer and TyDI-QA GoldP) to boost learning, achieving a hidden-split MAP of $25.05\%$ for Task A and a hidden-split $pAP$ of $57.11\%$ for Task B, with baseline TF-IDF far behind at $9.03\%$ MAP. The main contributions are the integration of external resources, an ensemble framework, thresholding for zero-answer detection, and a careful dataset splitting strategy to improve generalization under data scarcity. This approach demonstrates tangible improvements in Arabic Qur’an QA and provides a reproducible pipeline and released code/models for the community, highlighting practical impacts for low-resource QA in highly structured religious texts.
Abstract
In this paper, we present our approach to tackle Qur'an QA 2023 shared tasks A and B. To address the challenge of low-resourced training data, we rely on transfer learning together with a voting ensemble to improve prediction stability across multiple runs. Additionally, we employ different architectures and learning mechanisms for a range of Arabic pre-trained transformer-based models for both tasks. To identify unanswerable questions, we propose using a thresholding mechanism. Our top-performing systems greatly surpass the baseline performance on the hidden split, achieving a MAP score of 25.05% for task A and a partial Average Precision (pAP) of 57.11% for task B.
