Integrating LSTM and BERT for Long-Sequence Data Analysis in Intelligent Tutoring Systems

Zhaoxing Li; Jujie Yang; Jindi Wang; Lei Shi; Sebastian Stein

Integrating LSTM and BERT for Long-Sequence Data Analysis in Intelligent Tutoring Systems

Zhaoxing Li, Jujie Yang, Jindi Wang, Lei Shi, Sebastian Stein

TL;DR

LBKT addresses long-sequence knowledge tracing in Intelligent Tutoring Systems by integrating a BERT-based transformer with an LSTM and Rasch model–based embeddings to encode difficulty information. The architecture leverages $E_{Rasch} = E_d + E_d \times E_q$ and a final embedding $E = E_{Rasch} + E_{BERT\,Token} + E_{Position}$ within a 12-layer Transformer backbone, followed by an LSTM component with an NN projection, enabling efficient processing of sequences longer than 400 interactions. Across five benchmark ITS datasets (Assistments, EdNet, Junyi Academy, Algebra06), LBKT achieves superior ACC and AUC, while also delivering faster training and lower memory usage compared to strong baselines; ablation confirms the critical roles of Rasch embeddings and the LSTM. The work contributes a scalable, interpretable KT framework with potential for real-time deployment and motivates future extensions to multi-type data (exercises, concepts) to further enhance interpretability and applicability in adaptive learning settings.

Abstract

The field of Knowledge Tracing aims to understand how students learn and master knowledge over time by analyzing their historical behaviour data. To achieve this goal, many researchers have proposed Knowledge Tracing models that use data from Intelligent Tutoring Systems to predict students' subsequent actions. However, with the development of Intelligent Tutoring Systems, large-scale datasets containing long-sequence data began to emerge. Recent deep learning based Knowledge Tracing models face obstacles such as low efficiency, low accuracy, and low interpretability when dealing with large-scale datasets containing long-sequence data. To address these issues and promote the sustainable development of Intelligent Tutoring Systems, we propose a LSTM BERT-based Knowledge Tracing model for long sequence data processing, namely LBKT, which uses a BERT-based architecture with a Rasch model-based embeddings block to deal with different difficulty levels information and an LSTM block to process the sequential characteristic in students' actions. LBKT achieves the best performance on most benchmark datasets on the metrics of ACC and AUC. Additionally, an ablation study is conducted to analyse the impact of each component of LBKT's overall performance. Moreover, we used t-SNE as the visualisation tool to demonstrate the model's embedding strategy. The results indicate that LBKT is faster, more interpretable, and has a lower memory cost than the traditional deep learning based Knowledge Tracing methods.

Integrating LSTM and BERT for Long-Sequence Data Analysis in Intelligent Tutoring Systems

TL;DR

and a final embedding

within a 12-layer Transformer backbone, followed by an LSTM component with an NN projection, enabling efficient processing of sequences longer than 400 interactions. Across five benchmark ITS datasets (Assistments, EdNet, Junyi Academy, Algebra06), LBKT achieves superior ACC and AUC, while also delivering faster training and lower memory usage compared to strong baselines; ablation confirms the critical roles of Rasch embeddings and the LSTM. The work contributes a scalable, interpretable KT framework with potential for real-time deployment and motivates future extensions to multi-type data (exercises, concepts) to further enhance interpretability and applicability in adaptive learning settings.

Abstract

Paper Structure (17 sections, 4 equations, 3 figures, 6 tables)

This paper contains 17 sections, 4 equations, 3 figures, 6 tables.

Introduction
Related Work
Knowledge Tracing
Transformer-based Model and Application
Methodology
Problem Statement
Proposed Model Architecture
Experiment Setting
Datasets
Baseline Models
Evaluation Metrics and Validation
Hyperparameters for Experiments
Results and Discussion
Overall Performance
Ablation Study
...and 2 more sections

Figures (3)

Figure 1: The architecture of LBKT. LBKT consists of three components: 1) the Rasch model-based embeddings (on the left), 2) the BERT-based architecture (in the middle), and 3) the LSTM block (on the right).
Figure 2: Speed performance comparison of each model when processing data sequences with varying lengths. The vertical axis is the speed ($10^4$ samples per sec).
Figure 3: Visualisation of the embedding vector using t-SNE: without Rasch embeddings (on the left) and with Rasch embeddings (on the right). The colour bar is the predicted probability of the outputs.

Integrating LSTM and BERT for Long-Sequence Data Analysis in Intelligent Tutoring Systems

TL;DR

Abstract

Integrating LSTM and BERT for Long-Sequence Data Analysis in Intelligent Tutoring Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (3)