Table of Contents
Fetching ...

NCTB-QA: A Large-Scale Bangla Educational Question Answering Dataset and Benchmarking Performance

Abrar Eyasir, Tahsin Ahmed, Muhammad Ibrahim

TL;DR

This study introduces NCTB-QA, a large-scale Bangla question answering dataset comprising 87,805 question-answer pairs extracted from 50 textbooks published by Bangladesh's National Curriculum and Textbook Board and demonstrates that domain-specific fine-tuning is critical for robust performance in low-resource settings.

Abstract

Reading comprehension systems for low-resource languages face significant challenges in handling unanswerable questions. These systems tend to produce unreliable responses when correct answers are absent from context. To solve this problem, we introduce NCTB-QA, a large-scale Bangla question answering dataset comprising 87,805 question-answer pairs extracted from 50 textbooks published by Bangladesh's National Curriculum and Textbook Board. Unlike existing Bangla datasets, NCTB-QA maintains a balanced distribution of answerable (57.25%) and unanswerable (42.75%) questions. NCTB-QA also includes adversarially designed instances containing plausible distractors. We benchmark three transformer-based models (BERT, RoBERTa, ELECTRA) and demonstrate substantial improvements through fine-tuning. BERT achieves 313% relative improvement in F1 score (0.150 to 0.620). Semantic answer quality measured by BERTScore also increases significantly across all models. Our results establish NCTB-QA as a challenging benchmark for Bangla educational question answering. This study demonstrates that domain-specific fine-tuning is critical for robust performance in low-resource settings.

NCTB-QA: A Large-Scale Bangla Educational Question Answering Dataset and Benchmarking Performance

TL;DR

This study introduces NCTB-QA, a large-scale Bangla question answering dataset comprising 87,805 question-answer pairs extracted from 50 textbooks published by Bangladesh's National Curriculum and Textbook Board and demonstrates that domain-specific fine-tuning is critical for robust performance in low-resource settings.

Abstract

Reading comprehension systems for low-resource languages face significant challenges in handling unanswerable questions. These systems tend to produce unreliable responses when correct answers are absent from context. To solve this problem, we introduce NCTB-QA, a large-scale Bangla question answering dataset comprising 87,805 question-answer pairs extracted from 50 textbooks published by Bangladesh's National Curriculum and Textbook Board. Unlike existing Bangla datasets, NCTB-QA maintains a balanced distribution of answerable (57.25%) and unanswerable (42.75%) questions. NCTB-QA also includes adversarially designed instances containing plausible distractors. We benchmark three transformer-based models (BERT, RoBERTa, ELECTRA) and demonstrate substantial improvements through fine-tuning. BERT achieves 313% relative improvement in F1 score (0.150 to 0.620). Semantic answer quality measured by BERTScore also increases significantly across all models. Our results establish NCTB-QA as a challenging benchmark for Bangla educational question answering. This study demonstrates that domain-specific fine-tuning is critical for robust performance in low-resource settings.
Paper Structure (28 sections, 7 figures, 6 tables)

This paper contains 28 sections, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Overview of the content extraction and QA generation pipeline. Markdown textbooks are cleaned, segmented, and thematically grouped into coherent contexts, which are then processed by Gemini to generate structured QA pairs in JSON format.
  • Figure 2: Sample of Dataset
  • Figure 3: Distribution of word count per context in NCTB-QA. The distribution spans from 103 to 1,871 words, with the majority of contexts containing between 150--400 words.
  • Figure 4: Distribution of word count per question in NCTB-QA. Questions are predominantly concise, with most containing between 7--12 words.
  • Figure 5: Distribution of word count per answer in NCTB-QA. The right-skewed distribution shows most answers are brief, while some require extended responses.
  • ...and 2 more figures