Table of Contents
Fetching ...

Autonomous Concept Drift Threshold Determination

Pengqian Lu, Jie Lu, Anjin Liu, En Yu, Guangquan Zhang

TL;DR

The paper tackles the challenge of drift-detection thresholds in non-stationary data streams. It establishes theoretically that no fixed threshold can be optimal across all datasets and drift patterns, and proves that a dynamic threshold strategy strictly improves performance by aggregating locally optimal decisions. To realize this, it introduces the Dynamic Threshold Determination (DTD) algorithm, which runs a comparison phase with three candidate models to adapt the threshold based on observed performance. Extensive experiments across real-world and synthetic datasets demonstrate that DTD consistently boosts drift detector performance, often rivaling or surpassing SOTA methods and showing robustness to hyperparameters. The work offers a practical, theoretically grounded pathway to maintain model performance in evolving data environments, with potential extensions to end-to-end loss formulations and large-model adaptation.

Abstract

Existing drift detection methods focus on designing sensitive test statistics. They treat the detection threshold as a fixed hyperparameter, set once to balance false alarms and late detections, and applied uniformly across all datasets and over time. However, maintaining model performance is the key objective from the perspective of machine learning, and we observe that model performance is highly sensitive to this threshold. This observation inspires us to investigate whether a dynamic threshold could be provably better. In this paper, we prove that a threshold that adapts over time can outperform any single fixed threshold. The main idea of the proof is that a dynamic strategy, constructed by combining the best threshold from each individual data segment, is guaranteed to outperform any single threshold that apply to all segments. Based on the theorem, we propose a Dynamic Threshold Determination algorithm. It enhances existing drift detection frameworks with a novel comparison phase to inform how the threshold should be adjusted. Extensive experiments on a wide range of synthetic and real-world datasets, including both image and tabular data, validate that our approach substantially enhances the performance of state-of-the-art drift detectors.

Autonomous Concept Drift Threshold Determination

TL;DR

The paper tackles the challenge of drift-detection thresholds in non-stationary data streams. It establishes theoretically that no fixed threshold can be optimal across all datasets and drift patterns, and proves that a dynamic threshold strategy strictly improves performance by aggregating locally optimal decisions. To realize this, it introduces the Dynamic Threshold Determination (DTD) algorithm, which runs a comparison phase with three candidate models to adapt the threshold based on observed performance. Extensive experiments across real-world and synthetic datasets demonstrate that DTD consistently boosts drift detector performance, often rivaling or surpassing SOTA methods and showing robustness to hyperparameters. The work offers a practical, theoretically grounded pathway to maintain model performance in evolving data environments, with potential extensions to end-to-end loss formulations and large-model adaptation.

Abstract

Existing drift detection methods focus on designing sensitive test statistics. They treat the detection threshold as a fixed hyperparameter, set once to balance false alarms and late detections, and applied uniformly across all datasets and over time. However, maintaining model performance is the key objective from the perspective of machine learning, and we observe that model performance is highly sensitive to this threshold. This observation inspires us to investigate whether a dynamic threshold could be provably better. In this paper, we prove that a threshold that adapts over time can outperform any single fixed threshold. The main idea of the proof is that a dynamic strategy, constructed by combining the best threshold from each individual data segment, is guaranteed to outperform any single threshold that apply to all segments. Based on the theorem, we propose a Dynamic Threshold Determination algorithm. It enhances existing drift detection frameworks with a novel comparison phase to inform how the threshold should be adjusted. Extensive experiments on a wide range of synthetic and real-world datasets, including both image and tabular data, validate that our approach substantially enhances the performance of state-of-the-art drift detectors.

Paper Structure

This paper contains 31 sections, 3 theorems, 21 equations, 4 figures, 5 tables, 3 algorithms.

Key Result

Theorem 1

Perfect detection of concept drift may fail to yield optimal model performance in a streaming setting.

Figures (4)

  • Figure 1: A case study on the Airline dataset shows the classic HDDM-W detector is overly sensitive, raising 36 alarms for a low 48.64% accuracy. By applying our DTD algorithm, the enhanced $\text{DTD}_{\text{HDDM-W}}$ detector dynamically adapts its threshold and trigger only three alarms, significantly boosting mean accuracy to 58.31%.
  • Figure 2: Comparison of accuracy on CIFAR10-CD dataset.
  • Figure 3: Ablation study of the $\text{DTD}_\text{PUDD}$ algorithm's hyper-parameter $K$, which indicates the length of comparison phase. Lines indicate the mean accuracy for each dataset, while shaded regions show the standard deviation calculated from multiple trials. PS is short for the powersupply dataset.
  • Figure 4: The critical difference diagram shows statistically significant superiority of $\text{DTD}_\text{PUDD}$ over SOTA methods.

Theorems & Definitions (3)

  • Theorem 1: Perfect Detection May Not Be Optimal
  • Theorem 2: No Single Threshold is Universally Optimal
  • Theorem 3: Dynamic Thresholds Outperform Stationary Thresholds