Table of Contents
Fetching ...

Tracking student skills real-time through a continuous-variable dynamic Bayesian network

Hildo Bijl

TL;DR

This paper introduces Performance Distribution Tracing (PDT), a continuous-variable Dynamic Bayesian Network approach for real-time, explainable knowledge tracing. By modeling each skill's success rate as a distribution over $[0,1]$ using Beta-based basis functions, PDT provides analytically tractable online updates and explicit uncertainty. It extends to multi-skill exercises with probability polynomials $x(a,b)$, enabling updating via marginalization and a polynomial-based likelihood, and further integrates subskills and correlated skills through smoothing and merging operations. The framework yields live, high-signal skill distributions that end-user students and teachers can interpret, with potential to guide targeted practice and reduce bias, and is demonstrated with practical deployment and a plan for broader statistical evaluation and extensions.

Abstract

The field of Knowledge Tracing is focused on predicting the success rate of a student for a given skill. Modern methods like Deep Knowledge Tracing provide accurate estimates given enough data, but being based on neural networks they struggle to explain how these estimates are formed. More classical methods like Dynamic Bayesian Networks can do this, but they cannot give data on the accuracy of their estimates and often struggle to incorporate new observations in real-time due to their high computational load. This paper presents a novel method, Performance Distribution Tracing (PDT), in which the distribution of the success rate is traced live. It uses a Dynamic Bayesian Network with continuous random variables as nodes. By tracing the success rate distribution, there is always data available on the accuracy of any success rate estimation. In addition, it makes it possible to combine data from similar/related skills to come up with a more informed estimate of success rates. This makes it possible to predict exercise success rates, providing both explainability and an accuracy indication, even when an exercise requires a combination of different skills to solve. And through the use of the beta distribution functions as conjugate priors, all distributions are available in analytical form, allowing efficient online updates upon new observations. Experiments have shown that the resulting estimates generally feel sufficiently accurate to end-users such that they accept recommendations based on them.

Tracking student skills real-time through a continuous-variable dynamic Bayesian network

TL;DR

This paper introduces Performance Distribution Tracing (PDT), a continuous-variable Dynamic Bayesian Network approach for real-time, explainable knowledge tracing. By modeling each skill's success rate as a distribution over using Beta-based basis functions, PDT provides analytically tractable online updates and explicit uncertainty. It extends to multi-skill exercises with probability polynomials , enabling updating via marginalization and a polynomial-based likelihood, and further integrates subskills and correlated skills through smoothing and merging operations. The framework yields live, high-signal skill distributions that end-user students and teachers can interpret, with potential to guide targeted practice and reduce bias, and is demonstrated with practical deployment and a plan for broader statistical evaluation and extensions.

Abstract

The field of Knowledge Tracing is focused on predicting the success rate of a student for a given skill. Modern methods like Deep Knowledge Tracing provide accurate estimates given enough data, but being based on neural networks they struggle to explain how these estimates are formed. More classical methods like Dynamic Bayesian Networks can do this, but they cannot give data on the accuracy of their estimates and often struggle to incorporate new observations in real-time due to their high computational load. This paper presents a novel method, Performance Distribution Tracing (PDT), in which the distribution of the success rate is traced live. It uses a Dynamic Bayesian Network with continuous random variables as nodes. By tracing the success rate distribution, there is always data available on the accuracy of any success rate estimation. In addition, it makes it possible to combine data from similar/related skills to come up with a more informed estimate of success rates. This makes it possible to predict exercise success rates, providing both explainability and an accuracy indication, even when an exercise requires a combination of different skills to solve. And through the use of the beta distribution functions as conjugate priors, all distributions are available in analytical form, allowing efficient online updates upon new observations. Experiments have shown that the resulting estimates generally feel sufficiently accurate to end-users such that they accept recommendations based on them.
Paper Structure (22 sections, 48 equations, 4 figures)

This paper contains 22 sections, 48 equations, 4 figures.

Figures (4)

  • Figure 1: Three example distributions $f_{\underline{a}}(a)$ plotted from $0$ to $1$. They all have an expected value of $0.6$, denoting a $60\%$ estimated success rate, but their peakedness and hence their degree of certainty varies. (The functions correspond to the situations [red: 2 correct, 1 incorrect; $n = 3, c_2 = 1$], [blue: 14 correct, 9 incorrect; $n = 23, c_{14} = 1$] and [green: 29 correct, 19 incorrect; $n = 48, c_{29} = 1$] with no learning effect or links taken into account.)
  • Figure 2: The joint prior for two subsequent skill success rates $\underline{a}_k$ and $\underline{a}_{k+1}$, for smoothing order $n_s=10$. Note that, given a value of $\underline{a}_k$, the value of $\underline{a}_{k+1}$ will very likely be similar. Higher smoothing orders $n_s$ give a more peaked ridge at the diagonal line, and hence assume a stronger correlation between $\underline{a}_k$ and $\underline{a}_{k+1}$.
  • Figure 3: An example distribution (green, $n = 48, c_{29} = 1$) that is smoothed through orders $128$, $64$, $32$, $16$, $8$, $4$, $2$ and $1$ (top to bottom) respectively. Higher smoothing orders leave the function mostly unchanged, while lower smoothing orders 'flatten' it more. In fact, a smoothing order of $1$ guarantees a straight line, and a smoothing order of $0$ results in the flat prior $f_{\underline{a}}(a) = 1$. This smoothing is used to introduce extra uncertainty into the distribution, for instance due to the learning effect.
  • Figure 4: Overview of how to incorporate data from subskills and correlated skills. The star distributions -- stored in the database -- denote the distributions after incorporating a new observation from an exercise, but before applying any smoothing for practice and time decay. For smoothing use \ref{['eq:SmoothingCoefficients']}, for inference use \ref{['eq:InferenceCoefficients']} and for merging use \ref{['eq:MergingCoefficients']}.