Knowledge Tracing in Programming Education Integrating Students' Questions
Doyoun Kim, Suin Kim, Yojan Jo
TL;DR
This work addresses the challenge of knowledge tracing in programming education by incorporating students' questions as signals of understanding and misconceptions. It presents SQKT, a transformer-based architecture that fuses student questions, educator responses, and automatically extracted skills with code and problem embeddings, guided by a multi-task loss including $L_{pred}$, $L_{question}$, and $\lambda L_{triplet}$. The approach achieves up to a $33.1\%$ absolute improvement in AUC in in-domain settings and demonstrates robust cross-domain generalization, particularly when leveraging question-derived signals and the auto-mapped skills. The findings suggest that questions reveal nuanced conceptual understanding and that GPT-based skill mapping can scale to diverse programming content, enabling more personalized and effective adaptive learning in CS education.
Abstract
Knowledge tracing (KT) in programming education presents unique challenges due to the complexity of coding tasks and the diverse methods students use to solve problems. Although students' questions often contain valuable signals about their understanding and misconceptions, traditional KT models often neglect to incorporate these questions as inputs to address these challenges. This paper introduces SQKT (Students' Question-based Knowledge Tracing), a knowledge tracing model that leverages students' questions and automatically extracted skill information to enhance the accuracy of predicting students' performance on subsequent problems in programming education. Our method creates semantically rich embeddings that capture not only the surface-level content of the questions but also the student's mastery level and conceptual understanding. Experimental results demonstrate SQKT's superior performance in predicting student completion across various Python programming courses of differing difficulty levels. In in-domain experiments, SQKT achieved a 33.1\% absolute improvement in AUC compared to baseline models. The model also exhibited robust generalization capabilities in cross-domain settings, effectively addressing data scarcity issues in advanced programming courses. SQKT can be used to tailor educational content to individual learning needs and design adaptive learning systems in computer science education.
