Table of Contents
Fetching ...

Language Model Can Do Knowledge Tracing: Simple but Effective Method to Integrate Language Model and Knowledge Tracing Task

Unggi Lee, Jiyeong Bae, Dohee Kim, Sookbun Lee, Jaekwon Park, Taekyung Ahn, Gunho Lee, Damji Stratton, Hyeoncheol Kim

TL;DR

The paper addresses knowledge tracing by leveraging encoder-based pre-trained language models to exploit textual content in questions and knowledge concepts. It formats student interactions into text sequences and fine-tunes PLMs to predict the probability of correctness at the $[MASK]$ position, using a sigmoid output and binary cross-entropy loss ($\\hat{y} = \\sigma(h)$). On large KT benchmarks, LKT outperforms traditional DKTs, with DeBERTa-V3 and RoBERTa achieving the top AUC/ACC, while on smaller datasets DKTs remain competitive. The study also demonstrates robustness to cold-start via pre-training, provides interpretability through attention analyses and LIME, and suggests a viable direction for integrating PLMs into KT to improve personalized learning outcomes.

Abstract

Knowledge Tracing (KT) is a critical task in online learning for modeling student knowledge over time. Despite the success of deep learning-based KT models, which rely on sequences of numbers as data, most existing approaches fail to leverage the rich semantic information in the text of questions and concepts. This paper proposes Language model-based Knowledge Tracing (LKT), a novel framework that integrates pre-trained language models (PLMs) with KT methods. By leveraging the power of language models to capture semantic representations, LKT effectively incorporates textual information and significantly outperforms previous KT models on large benchmark datasets. Moreover, we demonstrate that LKT can effectively address the cold-start problem in KT by leveraging the semantic knowledge captured by PLMs. Interpretability of LKT is enhanced compared to traditional KT models due to its use of text-rich data. We conducted the local interpretable model-agnostic explanation technique and analysis of attention scores to interpret the model performance further. Our work highlights the potential of integrating PLMs with KT and paves the way for future research in KT domain.

Language Model Can Do Knowledge Tracing: Simple but Effective Method to Integrate Language Model and Knowledge Tracing Task

TL;DR

The paper addresses knowledge tracing by leveraging encoder-based pre-trained language models to exploit textual content in questions and knowledge concepts. It formats student interactions into text sequences and fine-tunes PLMs to predict the probability of correctness at the position, using a sigmoid output and binary cross-entropy loss (). On large KT benchmarks, LKT outperforms traditional DKTs, with DeBERTa-V3 and RoBERTa achieving the top AUC/ACC, while on smaller datasets DKTs remain competitive. The study also demonstrates robustness to cold-start via pre-training, provides interpretability through attention analyses and LIME, and suggests a viable direction for integrating PLMs into KT to improve personalized learning outcomes.

Abstract

Knowledge Tracing (KT) is a critical task in online learning for modeling student knowledge over time. Despite the success of deep learning-based KT models, which rely on sequences of numbers as data, most existing approaches fail to leverage the rich semantic information in the text of questions and concepts. This paper proposes Language model-based Knowledge Tracing (LKT), a novel framework that integrates pre-trained language models (PLMs) with KT methods. By leveraging the power of language models to capture semantic representations, LKT effectively incorporates textual information and significantly outperforms previous KT models on large benchmark datasets. Moreover, we demonstrate that LKT can effectively address the cold-start problem in KT by leveraging the semantic knowledge captured by PLMs. Interpretability of LKT is enhanced compared to traditional KT models due to its use of text-rich data. We conducted the local interpretable model-agnostic explanation technique and analysis of attention scores to interpret the model performance further. Our work highlights the potential of integrating PLMs with KT and paves the way for future research in KT domain.
Paper Structure (20 sections, 4 equations, 5 figures, 3 tables)

This paper contains 20 sections, 4 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Comparison of LKT and DKT on XES3G5M-T dataset. LKT, using RoBERTa with text data, outperforms DKT in both cold start and final AUC performance by leveraging rich text-based semantic information, unlike DKT's numerical sequences. The x-axis shows the proportion of the dataset used for cold start, and the y-axis represents AUC performance.
  • Figure 2: The comparison between DKT (Left) and LKT (Right). LKT uses encoder-based pre-trained LMs ($\mathcal{L}_{\theta_{pre}}$), while DKT models are trained from scratch ($f_{\theta_{init}}$). Data formats differ: DKT uses sequences of numbers (KCs, questions, responses), whereas LKT uses text. The Bottom shows interaction data from one student. In LKT, interactions are enclosed by $[CLS]$ and $[EOS]$ tokens, separating KCs and questions. Correctness is indicated by $[CORRECT]$, $[INCORRECT]$, and $[MASK]$ tokens. LKT models predict correctness at the $[MASK]$ position, with 15% of $[CORRECT]$ or $[INCORRECT]$ replaced by $[MASK]$, inspired by BERT devlin2018bert.
  • Figure 3: We examine the cold start problem in KT, which changes performance as model size increases. The Left shows the AUC of LKTs pre-trained on DBE-KT22 and DKT trained only on XES3G5M-T across different data sizes (0.1%, 0.5%, 1%, 3%, ..., 15%). The LKTs demonstrate robustness to the cold start problem. The Center displays AUC scores for different sequence lengths per student (5, 10, 20, etc.). The RoBERTa-based LKT performs well with fewer data, indicating initial solid performance. The Right compares the performance of large and base LKTs. Solid lines represent large models, while dashed lines represent base models. RoBERTa and ERNIE models maintain stable AUC performance regardless of size.
  • Figure 4: Performance comparison (AUC) of DKT and LKT models on XES3G5M-T data. The LKT model, pre-trained on DBE-KT22, outperformed the DKT model without additional training on new data. Note that DKT's performance is 0.5 due to its inability to utilize pre-training.
  • Figure 5: Visualization of the embedding vector with T-SNE. Left shows BERT and Right shows the result of BERT-LKT embedding. We can see that the results of BERT-LKT embedding represent the correctness probability well.