Table of Contents
Fetching ...

Bootstrapped Control Limits for Score-Based Concept Drift Control Charts

Jiezhong Wu, Daniel W. Apley

TL;DR

This work tackles concept drift detection in predictive models by focusing on the mean of Fisher score vectors and monitoring with a score-based MEWMA statistic. It introduces a novel nested bootstrap procedure to calibrate time-varying control limits using the full training data, accompanied by a 0.632-like variance correction to correctly reflect finite-sample variability. The approach yields accurate, pointwise false-alarm control and is data-efficient, enabling application with complex models, including deep neural networks. The results on linear and nonlinear dynamical systems demonstrate robust drift detection while maintaining proper Type I error rates, highlighting practical utility for real-time monitoring of predictive relationships.

Abstract

Monitoring for changes in a predictive relationship represented by a fitted supervised learning model (i.e., concept drift detection) is a widespread problem in modern data-driven applications. A general and powerful Fisher score-based concept drift approach was recently proposed, in which detecting concept drift reduces to detecting changes in the mean of the model's score vector using a multivariate exponentially weighted moving average (MEWMA). To implement the approach, the initial data must be split into two subsets. The first subset serves as the training sample to which the model is fit, and the second subset serves as an out-of-sample test set from which the MEWMA control limit (CL) is determined. In this paper, we retain the same score-based MEWMA monitoring statistic as the existing method and focus instead on improving the computation of the control limit. We develop a novel nested bootstrap procedure for calibrating the CL that allows the entire initial sample to be used for model fitting, thereby yielding a more accurate baseline model while eliminating the need for a large holdout set. We show that a standard nested bootstrap substantially underestimates the variability of the monitoring statistic and develop a 0.632-like correction that appropriately accounts for this. We demonstrate the advantages with numerical examples.

Bootstrapped Control Limits for Score-Based Concept Drift Control Charts

TL;DR

This work tackles concept drift detection in predictive models by focusing on the mean of Fisher score vectors and monitoring with a score-based MEWMA statistic. It introduces a novel nested bootstrap procedure to calibrate time-varying control limits using the full training data, accompanied by a 0.632-like variance correction to correctly reflect finite-sample variability. The approach yields accurate, pointwise false-alarm control and is data-efficient, enabling application with complex models, including deep neural networks. The results on linear and nonlinear dynamical systems demonstrate robust drift detection while maintaining proper Type I error rates, highlighting practical utility for real-time monitoring of predictive relationships.

Abstract

Monitoring for changes in a predictive relationship represented by a fitted supervised learning model (i.e., concept drift detection) is a widespread problem in modern data-driven applications. A general and powerful Fisher score-based concept drift approach was recently proposed, in which detecting concept drift reduces to detecting changes in the mean of the model's score vector using a multivariate exponentially weighted moving average (MEWMA). To implement the approach, the initial data must be split into two subsets. The first subset serves as the training sample to which the model is fit, and the second subset serves as an out-of-sample test set from which the MEWMA control limit (CL) is determined. In this paper, we retain the same score-based MEWMA monitoring statistic as the existing method and focus instead on improving the computation of the control limit. We develop a novel nested bootstrap procedure for calibrating the CL that allows the entire initial sample to be used for model fitting, thereby yielding a more accurate baseline model while eliminating the need for a large holdout set. We show that a standard nested bootstrap substantially underestimates the variability of the monitoring statistic and develop a 0.632-like correction that appropriately accounts for this. We demonstrate the advantages with numerical examples.

Paper Structure

This paper contains 15 sections, 6 theorems, 98 equations, 5 figures, 1 algorithm.

Key Result

Theorem 3.1

Suppose Assumptions ass:model and bootstrap assumption (see Appendix app) hold, and the functions $\boldsymbol{\mu}(\boldsymbol{\theta})$ and $\mathbf{V}(\boldsymbol{\theta})$ are differentiable in a neighborhood of $\boldsymbol{\theta}_0$. Then, for each $i \ge 1$, the unconditional covariance of t in the sense that

Figures (5)

  • Figure 4.1: Visualization of predictive relationships for the linear example before and after the shift. The blue points are a scatter plot of $y$ vs. $x$ for the pre-shift single linear model \ref{['curve 1']}, and the orange points correspond to the post-shift mixture model \ref{['curve 1']} and \ref{['curve 2']}. The shift from a single component to a mixture alters $\mathbb{P}(Y|\mathbf{X})$, representing a structural shift in the predictive relationship that is relatively small but can still be detected by our approach.
  • Figure 4.2: (a) Typical monitoring results using our bootstrap CL approach for the linear mixture example showing the detection of concept drift at new observation number 258 (the shift first occurred at observation number 201). The $T^2$ statistics ($T_i$ from Eq. \ref{['new T statistics']}, represented by the blue line) remain below the control limit (dashed orange line) prior to the shift at observation 201, and sharply exceed it shortly after the shift. (b) Comparison of the pointwise false-alarm rate for our bootstrap CL (blue curve) versus the CL of Zhang2023 (orange curve) averaged over 50 database replicates under the pre-shift predictive relationship \ref{['curve 1']}. Our bootstrap CL much more accurately maintains the empirical false-alarm rate close to the desired $\alpha = 0.001$.
  • Figure 4.3: Typical monitoring results using our bootstrap CL approach for the coupled nonlinear oscillator system under (a) low-noise and (b) high-noise conditions. The shifts were introduced at $i=201$ and are first detected at observations $i=215$ and $i=221$, respectively.
  • Figure 4.4: Empirical pointwise false-alarm rate (PFAR) for the nonlinear oscillator example under low- and high-noise conditions. The dashed horizontal line indicates the nominal level $\alpha=0.001$. Even under severe model misspecification (high noise), the bootstrap control limits maintain accurate pointwise false-alarm control prior to the change point.
  • Figure B.1: Empirical pointwise false-alarm rate and post-change detection behavior for the two-sample control limits of Zhang2023 in the nonlinear oscillator example. Prior to the change-point ($i \le 200$), the curves represent PFAR; after the change-point, they represent the pointwise probability of detection.

Theorems & Definitions (16)

  • Remark 3.1
  • Theorem 3.1: Covariance inflation factor
  • Corollary 3.2: Variance-corrected bootstrap $T^2$ and control limits
  • Theorem 3.3: Stabilization of the bootstrap control limit
  • Remark 3.2
  • Remark 3.3
  • Theorem 3.4: Detectability under model misspecification
  • Remark 3.4
  • Lemma A.1: Moments of the MEWMA statistic under $\mathbb{P}_0$
  • proof
  • ...and 6 more