Parametric Constraints for Bayesian Knowledge Tracing from First Principles

Denis Shchepakin; Sreecharan Sankaranarayanan; Dawn Zimmaro

Parametric Constraints for Bayesian Knowledge Tracing from First Principles

Denis Shchepakin, Sreecharan Sankaranarayanan, Dawn Zimmaro

TL;DR

The paper tackles the problem of degenerate or inconsistent parameter estimates in Bayesian Knowledge Tracing (BKT) when using EM-based fitting. It derives a concise set of first-principles constraints on the BKT parameters, including the four core probabilities $P(L_0)$, $P(G)$, $P(S)$, and $P(R)$ with constraints such as $0 < P(G) < 1$, $0 < P(S) < 1$, $0 < P(R) < 1$, $0 < P(L_t) < 1$, and $P^* = \frac{(1 - P(G)) P(R)}{1 - P(S) - P(G)} < P(L_0) < 1$. A novel EM algorithm using an interior-point (barrier) method is then proposed to maximize the EM objective $\widehat{Q}(\theta|\theta^*)$ under these constraints, guaranteeing non-degenerate parameter estimates. The authors demonstrate, via simulated data, that this constrained EM can rescue degenerate Baum-Welch solutions and yield feasible, interpretable parameter estimates, with a trade-off in per-iteration cost but potentially reduced need for multiple restarts. Overall, the approach enhances the reliability and interpretability of BKT in practice and provides a foundation for extending to BKT variants and related educational modeling tasks.

Abstract

Bayesian Knowledge Tracing (BKT) is a probabilistic model of a learner's state of mastery corresponding to a knowledge component. It considers the learner's state of mastery as a "hidden" or latent binary variable and updates this state based on the observed correctness of the learner's response using parameters that represent transition probabilities between states. BKT is often represented as a Hidden Markov Model and the Expectation-Maximization (EM) algorithm is used to infer these parameters. However, this algorithm can suffer from several issues including producing multiple viable sets of parameters, settling into a local minima, producing degenerate parameter values, and a high computational cost during fitting. This paper takes a "from first principles" approach to deriving constraints that can be imposed on the BKT parameter space. Starting from the basic mathematical truths of probability and building up to the behaviors expected of the BKT parameters in real systems, this paper presents a mathematical derivation that results in succinct constraints that can be imposed on the BKT parameter space. Since these constraints are necessary conditions, they can be applied prior to fitting in order to reduce computational cost and the likelihood of issues that can emerge from the EM procedure. In order to see that promise through, the paper further introduces a novel algorithm for estimating BKT parameters subject to the newly defined constraints. While the issue of degenerate parameter values has been reported previously, this paper is the first, to our best knowledge, to derive the constrains from first principles while also presenting an algorithm that respects those constraints.

Parametric Constraints for Bayesian Knowledge Tracing from First Principles

TL;DR

, and

with constraints such as

, and

. A novel EM algorithm using an interior-point (barrier) method is then proposed to maximize the EM objective

under these constraints, guaranteeing non-degenerate parameter estimates. The authors demonstrate, via simulated data, that this constrained EM can rescue degenerate Baum-Welch solutions and yield feasible, interpretable parameter estimates, with a trade-off in per-iteration cost but potentially reduced need for multiple restarts. Overall, the approach enhances the reliability and interpretability of BKT in practice and provides a foundation for extending to BKT variants and related educational modeling tasks.

Abstract

Paper Structure (10 sections, 1 theorem, 62 equations, 3 figures)

This paper contains 10 sections, 1 theorem, 62 equations, 3 figures.

Introduction
Defining the BKT Model
Restrictions on the BKT Parameters
Estimating the Parameters
Expectation-Maximization Algorithm
Novel EM Algorithm using the Interior-Point Method
Demonstrating the EM-Interior Point Method on Simulated Data
Discussion
Conclusion and Future Work
Baum-Welch Algorithm

Key Result

Theorem 1

In a sequence of all failed attempts $P(L_t)$ will asymptotically approach $P^*$ from the right, and in a sequence of all successful attempts $P(L_t)$ will asymptotically approach $1$ from the left.

Figures (3)

Figure 1: BKT parameters and notation
Figure 2: BKT model fitted to 100 simulated datasets using classical Baum-Welch algorithm (red triangles) and proposed EM-Interior Point method (hollow black triangles). The data was simulated using the same value of parameters: $P(L_0)=0.45$, $P(R)=0.3$, $P(S)=0.1$, and $P(G)=0.25$. For each dataset both algorithms used the same random initial parameter guesses. There were 80 datasets for which fitted parameters satisfied the parameter conditions for both algorithms (datasets [A]; red upward triangles and hollow black upward triangles). For the remaining 20 datasets (datasets [B]), the fitted parameters did not satisfy the conditions when Baum-Welch algorithm was used (red downward triangles), but they were rescued by EM-newton method (hollow black downward triangles).
Figure 3: BKT model fitted to the same dataset from 100 different initial parameter guesses using classical Baum-Welch algorithm (red triangles) and proposed EM-Newton method (hollow black triangles). The data was simulated using the same parameters values as in Figure \ref{['fig:different_datasets']}. There were 80 initial guesses that converged to parameters satisfying the conditions for both algorithms (datasets [C]; upward triangles), and 20 initial guesses that converged to parameters satisfying the conditions only for EM-Newton algorithm (datasets [D]; downward triangles). Note that for this simulation, the parameter estimates have lower accuracy but higher precision.

Theorems & Definitions (4)

proof
proof
Theorem 1
proof

Parametric Constraints for Bayesian Knowledge Tracing from First Principles

TL;DR

Abstract

Parametric Constraints for Bayesian Knowledge Tracing from First Principles

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (4)