Table of Contents
Fetching ...

DriftGAN: Using historical data for Unsupervised Recurring Drift Detection

Christofer Fellicious, Sahib Julka, Lorenz Wendlinger, Michael Granitzer

TL;DR

Concept drift in streaming data includes recurring drifts that are not well addressed by traditional detectors. DriftGAN presents a GAN-based, unsupervised framework with a multiclass discriminator that maps inputs to current or past drift distributions, and a generator that forecasts next-in-sequence vectors. The discriminator can expand with new drift classes, enabling reuse of data from past drifts to accelerate retraining and reduce labeling needs. Empirical results on eight drift datasets and a NASA MESSENGER use case show DriftGAN achieving higher accuracy than prior unsupervised methods, demonstrating faster adaptation when drifts recur and domain-general applicability. The approach trades some training-time growth for significantly faster recovery and broader applicability, with future work envisioned around conditional GANs to simulate distributions more effectively, and broader real-world deployments.

Abstract

In real-world applications, input data distributions are rarely static over a period of time, a phenomenon known as concept drift. Such concept drifts degrade the model's prediction performance, and therefore we require methods to overcome these issues. The initial step is to identify concept drifts and have a training method in place to recover the model's performance. Most concept drift detection methods work on detecting concept drifts and signalling the requirement to retrain the model. However, in real-world cases, there could be concept drifts that recur over a period of time. In this paper, we present an unsupervised method based on Generative Adversarial Networks(GAN) to detect concept drifts and identify whether a specific concept drift occurred in the past. Our method reduces the time and data the model requires to get up to speed for recurring drifts. Our key results indicate that our proposed model can outperform the current state-of-the-art models in most datasets. We also test our method on a real-world use case from astrophysics, where we detect the bow shock and magnetopause crossings with better results than the existing methods in the domain.

DriftGAN: Using historical data for Unsupervised Recurring Drift Detection

TL;DR

Concept drift in streaming data includes recurring drifts that are not well addressed by traditional detectors. DriftGAN presents a GAN-based, unsupervised framework with a multiclass discriminator that maps inputs to current or past drift distributions, and a generator that forecasts next-in-sequence vectors. The discriminator can expand with new drift classes, enabling reuse of data from past drifts to accelerate retraining and reduce labeling needs. Empirical results on eight drift datasets and a NASA MESSENGER use case show DriftGAN achieving higher accuracy than prior unsupervised methods, demonstrating faster adaptation when drifts recur and domain-general applicability. The approach trades some training-time growth for significantly faster recovery and broader applicability, with future work envisioned around conditional GANs to simulate distributions more effectively, and broader real-world deployments.

Abstract

In real-world applications, input data distributions are rarely static over a period of time, a phenomenon known as concept drift. Such concept drifts degrade the model's prediction performance, and therefore we require methods to overcome these issues. The initial step is to identify concept drifts and have a training method in place to recover the model's performance. Most concept drift detection methods work on detecting concept drifts and signalling the requirement to retrain the model. However, in real-world cases, there could be concept drifts that recur over a period of time. In this paper, we present an unsupervised method based on Generative Adversarial Networks(GAN) to detect concept drifts and identify whether a specific concept drift occurred in the past. Our method reduces the time and data the model requires to get up to speed for recurring drifts. Our key results indicate that our proposed model can outperform the current state-of-the-art models in most datasets. We also test our method on a real-world use case from astrophysics, where we detect the bow shock and magnetopause crossings with better results than the existing methods in the domain.
Paper Structure (15 sections, 3 equations, 2 tables)