Table of Contents
Fetching ...

Drift Detection: Introducing Gaussian Split Detector

Maxime Fuccellaro, Laurent Simon, Akka Zemmari

TL;DR

This work tackles concept drift detection without access to true labels during inference by introducing Gaussian Split Detector (GSD), a batch-mode method that builds an ensemble of Gaussian-based feature splits to learn training-time decision boundaries. During inference, it uses the EM algorithm to fit a Gaussian mixture for each feature and computes updated decision boundaries, signaling drift when boundary shifts exceed a threshold across enough splits. Empirical results show GSD effectively detects real drift while largely ignoring virtual drift, outperforming or matching several unsupervised baselines on real and synthetic datasets, particularly in high-dimensional settings. The approach leverages boundary changes rather than error rates, offering a practical, label-free solution for real-world drift monitoring with potential extensions to non-Gaussian and multiclass scenarios.

Abstract

Recent research yielded a wide array of drift detectors. However, in order to achieve remarkable performance, the true class labels must be available during the drift detection phase. This paper targets at detecting drift when the ground truth is unknown during the detection phase. To that end, we introduce Gaussian Split Detector (GSD) a novel drift detector that works in batch mode. GSD is designed to work when the data follow a normal distribution and makes use of Gaussian mixture models to monitor changes in the decision boundary. The algorithm is designed to handle multi-dimension data streams and to work without the ground truth labels during the inference phase making it pertinent for real world use. In an extensive experimental study on real and synthetic datasets, we evaluate our detector against the state of the art. We show that our detector outperforms the state of the art in detecting real drift and in ignoring virtual drift which is key to avoid false alarms.

Drift Detection: Introducing Gaussian Split Detector

TL;DR

This work tackles concept drift detection without access to true labels during inference by introducing Gaussian Split Detector (GSD), a batch-mode method that builds an ensemble of Gaussian-based feature splits to learn training-time decision boundaries. During inference, it uses the EM algorithm to fit a Gaussian mixture for each feature and computes updated decision boundaries, signaling drift when boundary shifts exceed a threshold across enough splits. Empirical results show GSD effectively detects real drift while largely ignoring virtual drift, outperforming or matching several unsupervised baselines on real and synthetic datasets, particularly in high-dimensional settings. The approach leverages boundary changes rather than error rates, offering a practical, label-free solution for real-world drift monitoring with potential extensions to non-Gaussian and multiclass scenarios.

Abstract

Recent research yielded a wide array of drift detectors. However, in order to achieve remarkable performance, the true class labels must be available during the drift detection phase. This paper targets at detecting drift when the ground truth is unknown during the detection phase. To that end, we introduce Gaussian Split Detector (GSD) a novel drift detector that works in batch mode. GSD is designed to work when the data follow a normal distribution and makes use of Gaussian mixture models to monitor changes in the decision boundary. The algorithm is designed to handle multi-dimension data streams and to work without the ground truth labels during the inference phase making it pertinent for real world use. In an extensive experimental study on real and synthetic datasets, we evaluate our detector against the state of the art. We show that our detector outperforms the state of the art in detecting real drift and in ignoring virtual drift which is key to avoid false alarms.
Paper Structure (12 sections, 12 equations, 2 figures, 4 tables, 1 algorithm)

This paper contains 12 sections, 12 equations, 2 figures, 4 tables, 1 algorithm.

Figures (2)

  • Figure 1: In this figure, we plot the weighted density functions for the positive (in green) and negative (in red) samples. The decision boundary $\alpha$ is marked by the black line.
  • Figure 2: Illustration of real and virtual drift