EduMod-LLM: A Modular Approach for Designing Flexible and Transparent Educational Assistants
Meenakshi Mittal, Rishi Khare, Mihran Miroyan, Chancharik Mitra, Narges Norouzi
TL;DR
EduMod-LLM presents a modular function-calling framework for educational QA that isolates function calling, retrieval, and generation to enable fine-grained analysis on real student questions. It introduces an LLM-as-a-Judge module aligned with TA standards to automate pedagogical-quality evaluation at scale. The work demonstrates that structure-aware retrieval and multihop function calling substantially improve retrieval relevance and response quality, with GPT-4.1 delivering strong generation performance. Overall, modular design enhances transparency, adaptability across courses, and pedagogical alignment in educational AI assistants.
Abstract
With the growing use of Large Language Model (LLM)-based Question-Answering (QA) systems in education, it is critical to evaluate their performance across individual pipeline components. In this work, we introduce {\model}, a modular function-calling LLM pipeline, and present a comprehensive evaluation along three key axes: function calling strategies, retrieval methods, and generative language models. Our framework enables fine-grained analysis by isolating and assessing each component. We benchmark function-calling performance across LLMs, compare our novel structure-aware retrieval method to vector-based and LLM-scoring baselines, and evaluate various LLMs for response synthesis. This modular approach reveals specific failure modes and performance patterns, supporting the development of interpretable and effective educational QA systems. Our findings demonstrate the value of modular function calling in improving system transparency and pedagogical alignment. Website and Supplementary Material: https://chancharikmitra.github.io/EduMod-LLM-website/
