Parametric Constraints for Bayesian Knowledge Tracing from First Principles
Denis Shchepakin, Sreecharan Sankaranarayanan, Dawn Zimmaro
TL;DR
The paper tackles the problem of degenerate or inconsistent parameter estimates in Bayesian Knowledge Tracing (BKT) when using EM-based fitting. It derives a concise set of first-principles constraints on the BKT parameters, including the four core probabilities $P(L_0)$, $P(G)$, $P(S)$, and $P(R)$ with constraints such as $0 < P(G) < 1$, $0 < P(S) < 1$, $0 < P(R) < 1$, $0 < P(L_t) < 1$, and $P^* = \frac{(1 - P(G)) P(R)}{1 - P(S) - P(G)} < P(L_0) < 1$. A novel EM algorithm using an interior-point (barrier) method is then proposed to maximize the EM objective $\widehat{Q}(\theta|\theta^*)$ under these constraints, guaranteeing non-degenerate parameter estimates. The authors demonstrate, via simulated data, that this constrained EM can rescue degenerate Baum-Welch solutions and yield feasible, interpretable parameter estimates, with a trade-off in per-iteration cost but potentially reduced need for multiple restarts. Overall, the approach enhances the reliability and interpretability of BKT in practice and provides a foundation for extending to BKT variants and related educational modeling tasks.
Abstract
Bayesian Knowledge Tracing (BKT) is a probabilistic model of a learner's state of mastery corresponding to a knowledge component. It considers the learner's state of mastery as a "hidden" or latent binary variable and updates this state based on the observed correctness of the learner's response using parameters that represent transition probabilities between states. BKT is often represented as a Hidden Markov Model and the Expectation-Maximization (EM) algorithm is used to infer these parameters. However, this algorithm can suffer from several issues including producing multiple viable sets of parameters, settling into a local minima, producing degenerate parameter values, and a high computational cost during fitting. This paper takes a "from first principles" approach to deriving constraints that can be imposed on the BKT parameter space. Starting from the basic mathematical truths of probability and building up to the behaviors expected of the BKT parameters in real systems, this paper presents a mathematical derivation that results in succinct constraints that can be imposed on the BKT parameter space. Since these constraints are necessary conditions, they can be applied prior to fitting in order to reduce computational cost and the likelihood of issues that can emerge from the EM procedure. In order to see that promise through, the paper further introduces a novel algorithm for estimating BKT parameters subject to the newly defined constraints. While the issue of degenerate parameter values has been reported previously, this paper is the first, to our best knowledge, to derive the constrains from first principles while also presenting an algorithm that respects those constraints.
