Towards Robust Knowledge Tracing Models via k-Sparse Attention
Shuyan Huang, Zitao Liu, Xiangyu Zhao, Weiqi Luo, Jian Weng
TL;DR
This work tackles overfitting in attention-based knowledge tracing by introducing sparseKT, a k-sparse attention framework that retains only the top-$k$ historical interactions after a self-attention pass. It includes two sparsification strategies, soft-thresholding and top-$K$, and augments interaction embeddings with a question-specific discrimination factor to improve robustness without sacrificing accuracy. Empirical results on three public educational datasets show sparseKT achieves competitive AUC/accuracy and often ranks in the top tier, improving generalization over SAKT and other baselines, with transparent KC relation visualizations supporting interpretability. The approach is open-source, enabling reproducibility and practical adoption in educational settings.
Abstract
Knowledge tracing (KT) is the problem of predicting students' future performance based on their historical interaction sequences. With the advanced capability of capturing contextual long-term dependency, attention mechanism becomes one of the essential components in many deep learning based KT (DLKT) models. In spite of the impressive performance achieved by these attentional DLKT models, many of them are often vulnerable to run the risk of overfitting, especially on small-scale educational datasets. Therefore, in this paper, we propose \textsc{sparseKT}, a simple yet effective framework to improve the robustness and generalization of the attention based DLKT approaches. Specifically, we incorporate a k-selection module to only pick items with the highest attention scores. We propose two sparsification heuristics : (1) soft-thresholding sparse attention and (2) top-$K$ sparse attention. We show that our \textsc{sparseKT} is able to help attentional KT models get rid of irrelevant student interactions and have comparable predictive performance when compared to 11 state-of-the-art KT models on three publicly available real-world educational datasets. To encourage reproducible research, we make our data and code publicly available at \url{https://github.com/pykt-team/pykt-toolkit}\footnote{We merged our model to the \textsc{pyKT} benchmark at \url{https://pykt.org/}.}.
