Table of Contents
Fetching ...

Next Token Knowledge Tracing: Exploiting Pretrained LLM Representations to Decode Student Behaviour

Max Norris, Kobi Gal, Sahan Bulathwela

TL;DR

This work introduces Next Token Knowledge Tracing (NTKT), a framework that reframes KT as a next-token prediction task by fine-tuning decoder-only LLMs (via LoRA) on sequences that combine student histories with question text. By applying selective masking to focus supervision on the correctness token while preserving full-context learning, NTKT leverages pretrained linguistic representations to outperform state-of-the-art KT baselines and generalise well to cold-start users and unseen questions. Across experiments on the Eedi dataset, NTKT demonstrates substantial gains in F1, Accuracy, and AUC, with full-text question representations providing the strongest predictive signals. The results highlight the value of incorporating question content into KT and point toward scalable, text-aware educational AI with robust generalisation in real-world deployments.

Abstract

Modelling student knowledge is a key challenge when leveraging AI in education, with major implications for personalised learning. The Knowledge Tracing (KT) task aims to predict how students will respond to educational questions in learning environments, based on their prior interactions. Existing KT models typically use response correctness along with metadata like skill tags and timestamps, often overlooking the question text, which is an important source of pedagogical insight. This omission poses a lost opportunity while limiting predictive performance. We propose Next Token Knowledge Tracing (NTKT), a novel approach that reframes KT as a next-token prediction task using pretrained Large Language Models (LLMs). NTKT represents both student histories and question content as sequences of text, allowing LLMs to learn patterns in both behaviour and language. Our series of experiments significantly improves performance over state-of-the-art neural KT models and generalises much better to cold-start questions and users. These findings highlight the importance of question content in KT and demonstrate the benefits of leveraging pretrained representations of LLMs to model student learning more effectively.

Next Token Knowledge Tracing: Exploiting Pretrained LLM Representations to Decode Student Behaviour

TL;DR

This work introduces Next Token Knowledge Tracing (NTKT), a framework that reframes KT as a next-token prediction task by fine-tuning decoder-only LLMs (via LoRA) on sequences that combine student histories with question text. By applying selective masking to focus supervision on the correctness token while preserving full-context learning, NTKT leverages pretrained linguistic representations to outperform state-of-the-art KT baselines and generalise well to cold-start users and unseen questions. Across experiments on the Eedi dataset, NTKT demonstrates substantial gains in F1, Accuracy, and AUC, with full-text question representations providing the strongest predictive signals. The results highlight the value of incorporating question content into KT and point toward scalable, text-aware educational AI with robust generalisation in real-world deployments.

Abstract

Modelling student knowledge is a key challenge when leveraging AI in education, with major implications for personalised learning. The Knowledge Tracing (KT) task aims to predict how students will respond to educational questions in learning environments, based on their prior interactions. Existing KT models typically use response correctness along with metadata like skill tags and timestamps, often overlooking the question text, which is an important source of pedagogical insight. This omission poses a lost opportunity while limiting predictive performance. We propose Next Token Knowledge Tracing (NTKT), a novel approach that reframes KT as a next-token prediction task using pretrained Large Language Models (LLMs). NTKT represents both student histories and question content as sequences of text, allowing LLMs to learn patterns in both behaviour and language. Our series of experiments significantly improves performance over state-of-the-art neural KT models and generalises much better to cold-start questions and users. These findings highlight the importance of question content in KT and demonstrate the benefits of leveraging pretrained representations of LLMs to model student learning more effectively.

Paper Structure

This paper contains 29 sections, 3 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: NTKT Pipeline: Data Preparation (Orange) and Fine-tuning (Green) with Grey circles representing different observable variables in the dataset including the target variable. $\oplus$ represents the concatenation operator where variable values are concatenated together to create training examples. The LoRA training involves keeping LLM weights $\Phi_0$ frozen and training the adapter weights $\Theta$ for each layer.
  • Figure 2: Average F1 score across timesteps for NTKT and baseline models in user cold-start scenarios. Curves show average F1 across all held-out users, from the initial interaction through subsequent steps.
  • Figure 3: Performance of NTKT and baseline models on seen versus cold-start questions, showing stable generalisation across unseen content.