Table of Contents
Fetching ...

MERIT: Memory-Enhanced Retrieval for Interpretable Knowledge Tracing

Runze Li, Kedi Chen, Guwei Feng, Mo Yu, Jun Wang, Wei Zhang

Abstract

Knowledge Tracing (KT) models students' evolving knowledge states to predict future performance, serving as a foundation for personalized education. While traditional deep learning models achieve high accuracy, they often lack interpretability. Large Language Models (LLMs) offer strong reasoning capabilities but struggle with limited context windows and hallucinations. Furthermore, existing LLM-based methods typically require expensive fine-tuning, limiting scalability and adaptability to new data. We propose MERIT (Memory-Enhanced Retrieval for Interpretable Knowledge Tracing), a training-free framework combining frozen LLM reasoning with structured pedagogical memory. Rather than updating parameters, MERIT transforms raw interaction logs into an interpretable memory bank. The framework uses semantic denoising to categorize students into latent cognitive schemas and constructs a paradigm bank where representative error patterns are analyzed offline to generate explicit Chain-of-Thought (CoT) rationales. During inference, a hierarchical routing mechanism retrieves relevant contexts, while a logic-augmented module applies semantic constraints to calibrate predictions. By grounding the LLM in interpretable memory, MERIT achieves state-of-the-art performance on real-world datasets without gradient updates. This approach reduces computational costs and supports dynamic knowledge updates, improving the accessibility and transparency of educational diagnosis.

MERIT: Memory-Enhanced Retrieval for Interpretable Knowledge Tracing

Abstract

Knowledge Tracing (KT) models students' evolving knowledge states to predict future performance, serving as a foundation for personalized education. While traditional deep learning models achieve high accuracy, they often lack interpretability. Large Language Models (LLMs) offer strong reasoning capabilities but struggle with limited context windows and hallucinations. Furthermore, existing LLM-based methods typically require expensive fine-tuning, limiting scalability and adaptability to new data. We propose MERIT (Memory-Enhanced Retrieval for Interpretable Knowledge Tracing), a training-free framework combining frozen LLM reasoning with structured pedagogical memory. Rather than updating parameters, MERIT transforms raw interaction logs into an interpretable memory bank. The framework uses semantic denoising to categorize students into latent cognitive schemas and constructs a paradigm bank where representative error patterns are analyzed offline to generate explicit Chain-of-Thought (CoT) rationales. During inference, a hierarchical routing mechanism retrieves relevant contexts, while a logic-augmented module applies semantic constraints to calibrate predictions. By grounding the LLM in interpretable memory, MERIT achieves state-of-the-art performance on real-world datasets without gradient updates. This approach reduces computational costs and supports dynamic knowledge updates, improving the accessibility and transparency of educational diagnosis.
Paper Structure (45 sections, 8 equations, 3 figures, 5 tables)

This paper contains 45 sections, 8 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Overview of the MERIT framework. The architecture integrates a frozen LLM with an external cognitive memory across four stages. Cognitive Schema Discovery first discretizes the student space using semantic denoising and density-based clustering. Interpretative Memory Bank Construction then transforms expert reasoning traces into a static retrieval database. Subsequently, Hierarchical Cognitive Retrieval applies global centroid routing and local hybrid search to maintain domain consistency. Finally, Logic-Augmented Reasoning incorporates semantic difficulty calibration and explicit boundary constraints to regulate the prediction.
  • Figure 2: Parameter sensitivity analysis on ASSISTments 2009 using Gemini-2.5-Flash. (Left) Impact of retrieval volume $k$. Performance peaks at $k=3$, suggesting that a compact context window limits noise from irrelevant samples. (Right) Sensitivity to hybrid search weight $\alpha$. A weight of $0.7$ optimally balances latent semantic features with symbolic keyword matching.
  • Figure 3: UMAP visualization of cognitive schema discovery. (a) Embeddings from raw logs overlap significantly, suggesting statistical artifacts (e.g., IDs, scores) obscure underlying patterns. (b) Conversely, Semantic Denoising yields distinct, compact clusters, confirming effective separation of cognitive signals from noise.