Table of Contents
Fetching ...

An Adaptive Sampling Framework for Detecting Localized Concept Drift under Label Scarcity

Junghee Pyeon, Davide Cacciarelli, Kamran Paynabar

TL;DR

PASS addresses the challenge of detecting localized concept drift in regression under label scarcity by integrating residual-informed exploitation, accept–reject exploration, and EWMA-based monitoring into a unified online framework. The method allocates labeling budgets via an $\epsilon$-greedy strategy, targets high-residual regions with a residual-weighted inverse transform sampling, and maintains broad coverage through time-weighted exploration, all while monitoring drift with two one-sided EWMA charts on top-$r$ residuals and residual dispersion. Theoretical properties guarantee that exploitation has a positive chance to hit drift regions and exploration cannot neglect any region, and simulations across Branin, Ishigami, Friedman, and Linkletter demonstrate robust detection under abrupt and incremental drifts, outperforming random and score-based baselines. A UK electricity market case study shows PASS can match full-sampling performance with a fraction of labels, highlighting practical impact for real-time monitoring under labeling constraints.

Abstract

Concept drift and label scarcity are two critical challenges limiting the robustness of predictive models in dynamic industrial environments. Existing drift detection methods often assume global shifts and rely on dense supervision, making them ill-suited for regression tasks with local drifts and limited labels. This paper proposes an adaptive sampling framework that combines residual-based exploration and exploitation with EWMA monitoring to efficiently detect local concept drift under labeling budget constraints. Empirical results on synthetic benchmarks and a case study on electricity market demonstrate superior performance in label efficiency and drift detection accuracy.

An Adaptive Sampling Framework for Detecting Localized Concept Drift under Label Scarcity

TL;DR

PASS addresses the challenge of detecting localized concept drift in regression under label scarcity by integrating residual-informed exploitation, accept–reject exploration, and EWMA-based monitoring into a unified online framework. The method allocates labeling budgets via an -greedy strategy, targets high-residual regions with a residual-weighted inverse transform sampling, and maintains broad coverage through time-weighted exploration, all while monitoring drift with two one-sided EWMA charts on top- residuals and residual dispersion. Theoretical properties guarantee that exploitation has a positive chance to hit drift regions and exploration cannot neglect any region, and simulations across Branin, Ishigami, Friedman, and Linkletter demonstrate robust detection under abrupt and incremental drifts, outperforming random and score-based baselines. A UK electricity market case study shows PASS can match full-sampling performance with a fraction of labels, highlighting practical impact for real-time monitoring under labeling constraints.

Abstract

Concept drift and label scarcity are two critical challenges limiting the robustness of predictive models in dynamic industrial environments. Existing drift detection methods often assume global shifts and rely on dense supervision, making them ill-suited for regression tasks with local drifts and limited labels. This paper proposes an adaptive sampling framework that combines residual-based exploration and exploitation with EWMA monitoring to efficiently detect local concept drift under labeling budget constraints. Empirical results on synthetic benchmarks and a case study on electricity market demonstrate superior performance in label efficiency and drift detection accuracy.

Paper Structure

This paper contains 18 sections, 2 theorems, 19 equations, 10 figures, 3 tables, 3 algorithms.

Key Result

Proposition 1

Let $\mathcal{R}\subset\mathbb{R}^d$ be a drift region with nonempty interior and fix $h>0$. For any $\mathbf{x}\in\mathcal{R}$, if $\tilde{\mathbf{x}}\sim\mathcal{N}(\mathbf{x},h^2 I_d)$, then $\mathbb{P}(\tilde{\mathbf{x}}\in\mathcal{R})>0$. Moreover, given points $\{\mathbf{x}_i\}_{i=1}^n\subset\

Figures (10)

  • Figure 1: Overview of the proposed framework; details of each step are described in the corresponding subsections.
  • Figure 2: Experiments setting example of Branin function
  • Figure 3: Spatial evolution of residual-based sampling around an abrupt drift (true change at $t=30$). Heatmap brightness indicates residual weight; yellow crosses are queried points; the green dashed box marks the true drift region. Snapshots at $t=33,34\sim38$, and $39\sim43$.
  • Figure 4: Localized abrupt drift with PASS ($\pi_{\mathrm{d}}=1\%$): ARL$_1$ versus drift magnitude $\Delta$ for the top-$r$ absolute-residual and log-variance EWMAs. The two curves track closely.
  • Figure 5: Abrupt drift at drift ratio $\pi_{\mathrm{d}}=1.0\%$: ARL$_1$ versus drift magnitude $\Delta$ (in units of $\sigma_{\text{noise}}$) for acronym ($\epsilon=0.5$, $V_t$ EWMA), Score vector ($\epsilon=1.0,0.5$), and Random ($V_t$ EWMA). Shaded regions denote 95% CIs.
  • ...and 5 more figures

Theorems & Definitions (4)

  • Proposition 1
  • Proposition 2
  • proof
  • proof