Drift Detection: Introducing Gaussian Split Detector
Maxime Fuccellaro, Laurent Simon, Akka Zemmari
TL;DR
This work tackles concept drift detection without access to true labels during inference by introducing Gaussian Split Detector (GSD), a batch-mode method that builds an ensemble of Gaussian-based feature splits to learn training-time decision boundaries. During inference, it uses the EM algorithm to fit a Gaussian mixture for each feature and computes updated decision boundaries, signaling drift when boundary shifts exceed a threshold across enough splits. Empirical results show GSD effectively detects real drift while largely ignoring virtual drift, outperforming or matching several unsupervised baselines on real and synthetic datasets, particularly in high-dimensional settings. The approach leverages boundary changes rather than error rates, offering a practical, label-free solution for real-world drift monitoring with potential extensions to non-Gaussian and multiclass scenarios.
Abstract
Recent research yielded a wide array of drift detectors. However, in order to achieve remarkable performance, the true class labels must be available during the drift detection phase. This paper targets at detecting drift when the ground truth is unknown during the detection phase. To that end, we introduce Gaussian Split Detector (GSD) a novel drift detector that works in batch mode. GSD is designed to work when the data follow a normal distribution and makes use of Gaussian mixture models to monitor changes in the decision boundary. The algorithm is designed to handle multi-dimension data streams and to work without the ground truth labels during the inference phase making it pertinent for real world use. In an extensive experimental study on real and synthetic datasets, we evaluate our detector against the state of the art. We show that our detector outperforms the state of the art in detecting real drift and in ignoring virtual drift which is key to avoid false alarms.
