Sequential Harmful Shift Detection Without Labels
Salim I. Amoukou, Tom Bewley, Saumitra Mishra, Freddy Lecue, Daniele Magazzeni, Manuela Veloso
TL;DR
This work tackles detecting harmful distribution shifts in continuous production without access to ground-truth labels by using a plug-in error estimator $\hat{r}$ to proxy the true error $E$, followed by calibration over empirical quantiles $(q,\hat{q})$ to identify high-error observations. A sequential testing framework based on time-uniform confidence bounds constructs lower and upper bounds $\hat{L}_q$ and $\hat{U}_q$ (or $\hat{U}_q^2$) to raise alarms when the estimated harmful-shift risk exceeds the baseline by a tolerance $\epsilon_{tol}$, with false-alarm control at level $\alpha_{source}+\alpha_{prod}$ under a mild assumption. Empirical results across CelebA, synthetic tabular shifts (California housing, Bike Sharing, HELOC, NHANES), and Folktables demonstrate that the proposed quantile-based detector achieves favorable power-FDP trade-offs and robust early detection, even when the error estimator is imperfect. The approach offers practical, online monitoring for production systems that cannot access immediate labels, enabling timely interventions without compromising false-alarm rates. Overall, the paper provides a principled, label-free framework for sequential harmful shift detection with theoretical guarantees and broad empirical validation.
Abstract
We introduce a novel approach for detecting distribution shifts that negatively impact the performance of machine learning models in continuous production environments, which requires no access to ground truth data labels. It builds upon the work of Podkopaev and Ramdas [2022], who address scenarios where labels are available for tracking model errors over time. Our solution extends this framework to work in the absence of labels, by employing a proxy for the true error. This proxy is derived using the predictions of a trained error estimator. Experiments show that our method has high power and false alarm control under various distribution shifts, including covariate and label shifts and natural shifts over geography and time.
