Automated Knowledge Component Generation for Interpretable Knowledge Tracing in Coding Problems
Zhangqi Duan, Nigel Fernandez, Arun Balajiee Lekshmi Narayanan, Mohammad Hassany, Rafaella Sampaio de Alencar, Peter Brusilovsky, Bita Akram, Andrew Lan
TL;DR
This work presents KCGen-KT, an end-to-end framework that automates knowledge component generation for open-ended programming problems and leverages KC semantics to improve knowledge tracing. The KC generation pipeline uses GPT-4o to create fine-grained KCs per problem, clusters them to control abstraction, and labels clusters to form a problem-KC mapping, yielding high-quality, interpretable KC descriptions. The KCGen-KT KT model employs Llama 3 with soft KC mastery tokens, modeling per-KC mastery and predicting both future code and problem correctness in a multi-task objective, with interpretability enforced via averaging KC mastery and a monotonic KC regularization term. Empirical results on CodeWorkout (Java) and FalconCode (Python) show KCGen-KT with generated KCs outperforming baselines and human-written KCs on future performance prediction, with better learning-curve fits to cognitive models; a human evaluation confirms the generated KCs are interpretable and cover essential problem concepts. The approach reduces manual KC tagging effort and offers a scalable, interpretable KT framework with potential applicability beyond programming to other student modeling domains.
Abstract
Knowledge components (KCs) mapped to problems help model student learning, tracking their mastery levels on fine-grained skills thereby facilitating personalized learning and feedback in online learning platforms. However, crafting and tagging KCs to problems, traditionally performed by human domain experts, is highly labor intensive. We present an automated, LLM-based pipeline for KC generation and tagging for open-ended programming problems. We also develop an LLM-based knowledge tracing (KT) framework to leverage these LLM-generated KCs, which we refer to as KCGen-KT. We conduct extensive quantitative and qualitative evaluations on two real-world student code submission datasets in different programming languages.We find that KCGen-KT outperforms existing KT methods and human-written KCs on future student response prediction. We investigate the learning curves of generated KCs and show that LLM-generated KCs result in a better fit than human written KCs under a cognitive model. We also conduct a human evaluation with course instructors to show that our pipeline generates reasonably accurate problem-KC mappings.
