Early Concept Drift Detection via Prediction Uncertainty
Pengqian Lu, Jie Lu, Anjin Liu, Guangquan Zhang
TL;DR
The paper tackles concept drift in streaming data, showing that error-rate based detectors can miss early distributional changes. It introduces the Prediction Uncertainty Index (PU-index) with $u_i = 1 - f_{y_i}(x_i)$ and develops PUDD, a drift detector built on an Adaptive PU-index Bucketing scheme and Pearson’s Chi-square testing. The authors prove theoretical properties (Theorem 1 and Theorem 2) establishing PU-index as at least as sensitive as error-rate signals and capable of detecting drift beyond what error rates reveal, then validate the approach on synthetic and real-world datasets, including CIFAR-10-CD. Empirical results demonstrate that PUDD often outperforms classic detectors and SOTA methods, with the bucketing strategy providing notable gains, suggesting substantial practical impact for robust, early drift monitoring in diverse domains.
Abstract
Concept drift, characterized by unpredictable changes in data distribution over time, poses significant challenges to machine learning models in streaming data scenarios. Although error rate-based concept drift detectors are widely used, they often fail to identify drift in the early stages when the data distribution changes but error rates remain constant. This paper introduces the Prediction Uncertainty Index (PU-index), derived from the prediction uncertainty of the classifier, as a superior alternative to the error rate for drift detection. Our theoretical analysis demonstrates that: (1) The PU-index can detect drift even when error rates remain stable. (2) Any change in the error rate will lead to a corresponding change in the PU-index. These properties make the PU-index a more sensitive and robust indicator for drift detection compared to existing methods. We also propose a PU-index-based Drift Detector (PUDD) that employs a novel Adaptive PU-index Bucketing algorithm for detecting drift. Empirical evaluations on both synthetic and real-world datasets demonstrate PUDD's efficacy in detecting drift in structured and image data.
