A Synthetic Benchmark to Explore Limitations of Localized Drift Detections
Flavio Giobergia, Eliana Pastor, Luca de Alfaro, Elena Baralis
TL;DR
This paper tackles the limitation of assuming global concept drift by examining localized drift within subpopulations. It introduces the Subgroup Agrawal Drift Dataset, a synthetic benchmark built on the Agrawal generator where a randomly selected subgroup experiences gradual drift defined by $F = s(x) \cdot [Z \cdot f_i(x) + (1 - Z) \cdot f_j(x)] + (1 - s(x)) f_i(x)$ with $p_t = (1 + e^{-4(t-k)/w})^{-1}$, enabling controlled evaluation of local-drift detectors. The authors evaluate four drift detectors—DDM, EDDM, HDDM, and FHDDM—across drifting-subgroup sizes from $1\%$ to $100\%$ and show that detection performance collapses for small subgroups due to high false negatives, while false positives remain relatively stable. The work provides public code for generating the benchmark and highlights the need for detector designs that can explicitly handle localized drift, with implications for fairness and subpopulation-aware monitoring.
Abstract
Concept drift is a common phenomenon in data streams where the statistical properties of the target variable change over time. Traditionally, drift is assumed to occur globally, affecting the entire dataset uniformly. However, this assumption does not always hold true in real-world scenarios where only specific subpopulations within the data may experience drift. This paper explores the concept of localized drift and evaluates the performance of several drift detection techniques in identifying such localized changes. We introduce a synthetic dataset based on the Agrawal generator, where drift is induced in a randomly chosen subgroup. Our experiments demonstrate that commonly adopted drift detection methods may fail to detect drift when it is confined to a small subpopulation. We propose and test various drift detection approaches to quantify their effectiveness in this localized drift scenario. We make the source code for the generation of the synthetic benchmark available at https://github.com/fgiobergia/subgroup-agrawal-drift.
