Predictive, scalable and interpretable knowledge tracing on structured domains

Hanqi Zhou; Robert Bamler; Charley M. Wu; Álvaro Tejero-Cantero

Predictive, scalable and interpretable knowledge tracing on structured domains

Hanqi Zhou, Robert Bamler, Charley M. Wu, Álvaro Tejero-Cantero

TL;DR

PSI-KT is a hierarchical generative approach that explicitly models how both individual cognitive traits and the prerequisite structure of knowledge influence learning dynamics, thus achieving interpretability by design and targets the real-world need for efficient personalization even with a growing body of learners and learning histories.

Abstract

Intelligent tutoring systems optimize the selection and timing of learning materials to enhance understanding and long-term retention. This requires estimates of both the learner's progress (''knowledge tracing''; KT), and the prerequisite structure of the learning domain (''knowledge mapping''). While recent deep learning models achieve high KT accuracy, they do so at the expense of the interpretability of psychologically-inspired models. In this work, we present a solution to this trade-off. PSI-KT is a hierarchical generative approach that explicitly models how both individual cognitive traits and the prerequisite structure of knowledge influence learning dynamics, thus achieving interpretability by design. Moreover, by using scalable Bayesian inference, PSI-KT targets the real-world need for efficient personalization even with a growing body of learners and learning histories. Evaluated on three datasets from online learning platforms, PSI-KT achieves superior multi-step predictive accuracy and scalable inference in continual-learning settings, all while providing interpretable representations of learner-specific traits and the prerequisite structure of knowledge that causally supports learning. In sum, predictive, scalable and interpretable knowledge tracing with solid knowledge mapping lays a key foundation for effective personalized learning to make education accessible to a broad, global audience.

Predictive, scalable and interpretable knowledge tracing on structured domains

TL;DR

Abstract

Paper Structure (55 sections, 26 equations, 13 figures, 16 tables)

This paper contains 55 sections, 26 equations, 13 figures, 16 tables.

Introduction
Background
Knowledge tracing for intelligent tutoring systems
Related work
Joint dynamical and structural model of learning
Probabilistic state-space generative model
State-space model.
Knowledge states $\bm{z}$.
Learner-specific cognitive traits $s$.
Shared prerequisite graph $\mathcal{A}$.
Approximate Bayesian Inference and Amortization with a Neural Network
Inference on a fixed learning history
Inference in continual learning
Predictions
Evaluations
...and 40 more sections

Figures (13)

Figure 1: psi-kt is a hierarchical probabilistic state-space model of learning. (a) Latent knowledge states for different KCs (colored curves) are inferred from observations. (b) Full hierarchical model for a single learner: cognitive traits $s_n$ control the coupled dynamics of states $z^k_n$, which give rise to observations $y_n$. (c) The dynamics combine memory decay (Eq. \ref{['eq:ou-process-marginal']}) and structural influences (Eq. \ref{['eq:ou-process-mean']}).
Figure 2: Within-learner prediction performance (mean $\pm$sem) as a function of cohort sizes from 100 to the maximum available in each dataset (we omit hlr for legibility; see Table \ref{['tab:generalization-results']}.)
Figure 3: Continual learning. (Top) Cumulative training time. (Bottom) Prediction accuracy on the next 10 time steps. We omit results when time is above, or accuracy is below, the range of the axes.
Figure 4: Operational interpretability of representations, Junyi15 dataset. See text for axes labels and Appendix \ref{['appsec:interpretability-learner-mixed-effect']} for additional results.
Figure 5: Graph interpretability. (a) Subgraph inferred by psi-kt on the Junyi15 dataset, showing prerequisites of target KC 'area of parallelograms'. (b) Hypothesized causal graphs, where Graph 1 assumes a causal relationship exists from KC $i$ to KC $k$, while Graph 0 is the null hypothesis. (c) Regression of edge probabilities against causal supports. Insets show the best baseline model.
...and 8 more figures

Predictive, scalable and interpretable knowledge tracing on structured domains

TL;DR

Abstract

Predictive, scalable and interpretable knowledge tracing on structured domains

Authors

TL;DR

Abstract

Table of Contents

Figures (13)