Sequential Harmful Shift Detection Without Labels

Salim I. Amoukou; Tom Bewley; Saumitra Mishra; Freddy Lecue; Daniele Magazzeni; Manuela Veloso

Sequential Harmful Shift Detection Without Labels

Salim I. Amoukou, Tom Bewley, Saumitra Mishra, Freddy Lecue, Daniele Magazzeni, Manuela Veloso

TL;DR

This work tackles detecting harmful distribution shifts in continuous production without access to ground-truth labels by using a plug-in error estimator $\hat{r}$ to proxy the true error $E$, followed by calibration over empirical quantiles $(q,\hat{q})$ to identify high-error observations. A sequential testing framework based on time-uniform confidence bounds constructs lower and upper bounds $\hat{L}_q$ and $\hat{U}_q$ (or $\hat{U}_q^2$) to raise alarms when the estimated harmful-shift risk exceeds the baseline by a tolerance $\epsilon_{tol}$, with false-alarm control at level $\alpha_{source}+\alpha_{prod}$ under a mild assumption. Empirical results across CelebA, synthetic tabular shifts (California housing, Bike Sharing, HELOC, NHANES), and Folktables demonstrate that the proposed quantile-based detector achieves favorable power-FDP trade-offs and robust early detection, even when the error estimator is imperfect. The approach offers practical, online monitoring for production systems that cannot access immediate labels, enabling timely interventions without compromising false-alarm rates. Overall, the paper provides a principled, label-free framework for sequential harmful shift detection with theoretical guarantees and broad empirical validation.

Abstract

We introduce a novel approach for detecting distribution shifts that negatively impact the performance of machine learning models in continuous production environments, which requires no access to ground truth data labels. It builds upon the work of Podkopaev and Ramdas [2022], who address scenarios where labels are available for tracking model errors over time. Our solution extends this framework to work in the absence of labels, by employing a proxy for the true error. This proxy is derived using the predictions of a trained error estimator. Experiments show that our method has high power and false alarm control under various distribution shifts, including covariate and label shifts and natural shifts over geography and time.

Sequential Harmful Shift Detection Without Labels

TL;DR

This work tackles detecting harmful distribution shifts in continuous production without access to ground-truth labels by using a plug-in error estimator

to proxy the true error

, followed by calibration over empirical quantiles

to identify high-error observations. A sequential testing framework based on time-uniform confidence bounds constructs lower and upper bounds

and

(or

) to raise alarms when the estimated harmful-shift risk exceeds the baseline by a tolerance

, with false-alarm control at level

under a mild assumption. Empirical results across CelebA, synthetic tabular shifts (California housing, Bike Sharing, HELOC, NHANES), and Folktables demonstrate that the proposed quantile-based detector achieves favorable power-FDP trade-offs and robust early detection, even when the error estimator is imperfect. The approach offers practical, online monitoring for production systems that cannot access immediate labels, enabling timely interventions without compromising false-alarm rates. Overall, the paper provides a principled, label-free framework for sequential harmful shift detection with theoretical guarantees and broad empirical validation.

Sequential Harmful Shift Detection Without Labels

TL;DR

Abstract

Sequential Harmful Shift Detection Without Labels

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (4)