PustakAI: Curriculum-Aligned and Interactive Textbooks Using Large Language Models
Shivam Sharma, Riya Naik, Tejas Gawas, Heramb Patil, Kunal Korgaonkar
TL;DR
The paper addresses the challenge of building curriculum-aligned QA for education by introducing the NCERT-QA dataset and the PustakAI framework. It employs a retrieval-augmented prompting pipeline with varied strategies (including meta-prompting and one-shot prompts) to test model performance across English and Science for grades 6–8. Key findings show that contextual grounding is essential for accurate, faithful answers, with larger models and meta-prompting delivering the best balance of accuracy and efficiency. The work offers a practical path toward deploying curriculum-aware educational AI in resource-constrained schools and sets the stage for expanding to additional grades and subjects.
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in understanding and generating human-like content. This has revolutionized various sectors such as healthcare, software development, and education. In education, LLMs offer potential for personalized and interactive learning experiences, especially in regions with limited teaching resources. However, adapting these models effectively to curriculum-specific content, such as the National Council of Educational Research and Training (NCERT) syllabus in India, presents unique challenges in terms of accuracy, alignment, and pedagogical relevance. In this paper, we present the framework "PustakAI"\footnote{Pustak means `book' in many Indian languages.} for the design and evaluation of a novel question-answering dataset "NCERT-QA" aligned with the NCERT curriculum for English and Science subjects of grades 6 to 8. We classify the curated QA pairs as Factoid, Inferential, and Others (evaluative and reasoning). We evaluate the dataset with various prompting techniques, such as meta-prompt, few-shot, and CoT-style prompting, using diverse evaluation metrics to understand which approach aligns more efficiently with the structure and demands of the curriculum. Along with the usability of the dataset, we analyze the strengths and limitations of current open-source LLMs (Gemma3:1b, Llama3.2:3b, and Nemotron-mini:4b) and high-end LLMs (Llama-4-Scout-17B and Deepseek-r1-70B) as AI-based learning tools in formal education systems.
