Table of Contents
Fetching ...

CultranAI at PalmX 2025: Data Augmentation for Cultural Knowledge Representation

Hunzalah Hassan Bhatti, Youssef Ahmed, Md Arid Hasan, Firoj Alam

TL;DR

The paper tackles culturally grounded Arabic knowledge representation in LLMs by exploring data augmentation and LoRA fine-tuning on Fanar-7B-Instruct within the PalmX Arabic NLP shared task. It introduces PalmX-ext augmentation via Palm, NativQA, and GPT-4.1 to build a training corpus exceeding 22K MCQs, and shows that augmentation plus LoRA improves PalmX Dev and Palm test performance, achieving 84.1% on Palm and 70.5% on the blind test. The study compares LoRA and QLoRA under low-compute constraints, finding LoRA competitive, and demonstrates the benefit of external culturally grounded data for Arabic models. Overall, the work provides a practical pathway for deploying culturally aware LLMs in Arabic with modest compute.

Abstract

In this paper, we report our participation to the PalmX cultural evaluation shared task. Our system, CultranAI, focused on data augmentation and LoRA fine-tuning of large language models (LLMs) for Arabic cultural knowledge representation. We benchmarked several LLMs to identify the best-performing model for the task. In addition to utilizing the PalmX dataset, we augmented it by incorporating the Palm dataset and curated a new dataset of over 22K culturally grounded multiple-choice questions (MCQs). Our experiments showed that the Fanar-1-9B-Instruct model achieved the highest performance. We fine-tuned this model on the combined augmented dataset of 22K+ MCQs. On the blind test set, our submitted system ranked 5th with an accuracy of 70.50%, while on the PalmX development set, it achieved an accuracy of 84.1%.

CultranAI at PalmX 2025: Data Augmentation for Cultural Knowledge Representation

TL;DR

The paper tackles culturally grounded Arabic knowledge representation in LLMs by exploring data augmentation and LoRA fine-tuning on Fanar-7B-Instruct within the PalmX Arabic NLP shared task. It introduces PalmX-ext augmentation via Palm, NativQA, and GPT-4.1 to build a training corpus exceeding 22K MCQs, and shows that augmentation plus LoRA improves PalmX Dev and Palm test performance, achieving 84.1% on Palm and 70.5% on the blind test. The study compares LoRA and QLoRA under low-compute constraints, finding LoRA competitive, and demonstrates the benefit of external culturally grounded data for Arabic models. Overall, the work provides a practical pathway for deploying culturally aware LLMs in Arabic with modest compute.

Abstract

In this paper, we report our participation to the PalmX cultural evaluation shared task. Our system, CultranAI, focused on data augmentation and LoRA fine-tuning of large language models (LLMs) for Arabic cultural knowledge representation. We benchmarked several LLMs to identify the best-performing model for the task. In addition to utilizing the PalmX dataset, we augmented it by incorporating the Palm dataset and curated a new dataset of over 22K culturally grounded multiple-choice questions (MCQs). Our experiments showed that the Fanar-1-9B-Instruct model achieved the highest performance. We fine-tuned this model on the combined augmented dataset of 22K+ MCQs. On the blind test set, our submitted system ranked 5th with an accuracy of 70.50%, while on the PalmX development set, it achieved an accuracy of 84.1%.

Paper Structure

This paper contains 19 sections, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Pipeline for extending the PalmX dataset using the NativQA framework and GPT-4.1.
  • Figure 2: Example of a formatted prompt used for Arabic MCQ fine-tuning.
  • Figure 3: Questions solved by both PalmX-only and Augmentation.
  • Figure 4: Questions solved only with Augmentation.
  • Figure 5: Examples from PalmX Cultural Train Set.
  • ...and 2 more figures