Table of Contents
Fetching ...

Improving Multilingual Math Reasoning for African Languages

Odunayo Ogundepo, Akintunde Oladipo, Kelechi Ogueji, Esther Adenuga, David Ifeoluwa Adelani, Jimmy Lin

TL;DR

The paper systematically evaluates strategies to adapt LLMs for mathematical reasoning in nine African languages, contrasting synthetic native-language data with translated datasets across multi-stage training paradigms. It introduces a persona-driven synthetic data pipeline (AfriPersona-Instruct) and assesses translated data (BigMath/OpenMathInstruct) using a controlled Llama 3.1 8B setup, with evaluation via an LLM-as-judge on AfriMGSM and unseen languages. Key findings show that native-language synthetic data generally yields strong performance, that mixing data sources and multilingual fine-tuning boosts cross-lingual transfer, and that masking prompts during loss is not beneficial, while scaling data size markedly improves results. The work provides practical artifacts (AfriPersonaHub, AfriPersona-Instruct) and actionable guidance for building robust multilingual math reasoning in low-resource African languages, while acknowledging limitations like domain specificity and the need for data verification and domain-aligned pretraining.

Abstract

Researchers working on low-resource languages face persistent challenges due to limited data availability and restricted access to computational resources. Although most large language models (LLMs) are predominantly trained in high-resource languages, adapting them to low-resource contexts, particularly African languages, requires specialized techniques. Several strategies have emerged for adapting models to low-resource languages in todays LLM landscape, defined by multi-stage pre-training and post-training paradigms. However, the most effective approaches remain uncertain. This work systematically investigates which adaptation strategies yield the best performance when extending existing LLMs to African languages. We conduct extensive experiments and ablation studies to evaluate different combinations of data types (translated versus synthetically generated), training stages (pre-training versus post-training), and other model adaptation configurations. Our experiments focuses on mathematical reasoning tasks, using the Llama 3.1 model family as our base model.

Improving Multilingual Math Reasoning for African Languages

TL;DR

The paper systematically evaluates strategies to adapt LLMs for mathematical reasoning in nine African languages, contrasting synthetic native-language data with translated datasets across multi-stage training paradigms. It introduces a persona-driven synthetic data pipeline (AfriPersona-Instruct) and assesses translated data (BigMath/OpenMathInstruct) using a controlled Llama 3.1 8B setup, with evaluation via an LLM-as-judge on AfriMGSM and unseen languages. Key findings show that native-language synthetic data generally yields strong performance, that mixing data sources and multilingual fine-tuning boosts cross-lingual transfer, and that masking prompts during loss is not beneficial, while scaling data size markedly improves results. The work provides practical artifacts (AfriPersonaHub, AfriPersona-Instruct) and actionable guidance for building robust multilingual math reasoning in low-resource African languages, while acknowledging limitations like domain specificity and the need for data verification and domain-aligned pretraining.

Abstract

Researchers working on low-resource languages face persistent challenges due to limited data availability and restricted access to computational resources. Although most large language models (LLMs) are predominantly trained in high-resource languages, adapting them to low-resource contexts, particularly African languages, requires specialized techniques. Several strategies have emerged for adapting models to low-resource languages in todays LLM landscape, defined by multi-stage pre-training and post-training paradigms. However, the most effective approaches remain uncertain. This work systematically investigates which adaptation strategies yield the best performance when extending existing LLMs to African languages. We conduct extensive experiments and ablation studies to evaluate different combinations of data types (translated versus synthetically generated), training stages (pre-training versus post-training), and other model adaptation configurations. Our experiments focuses on mathematical reasoning tasks, using the Llama 3.1 model family as our base model.

Paper Structure

This paper contains 34 sections, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Heatmap showing performance across different model configurations for each language. This heatmap highlights the performance across Llama 3.1 8b instruct finetuned on different datasets