Table of Contents
Fetching ...

GradStop: Exploring Training Dynamics in Unsupervised Outlier Detection through Gradient

Yuang Zhang, Liping Wang, Yihong Huang, Yuanxing Zheng, Fan Zhang, Xuemin Lin

TL;DR

This work tackles unsupervised outlier detection (UOD) in contaminated datasets, where label-free evaluation makes training difficult due to misalignment between optimization and OD goals. It introduces GradStop, a gradient-based, label-free early-stopping method that uses GradSample to form two gradient sets and computes cohesion $\oldsymbol{C}$ and divergence $\boldsymbol{D}$ to monitor training dynamics relative to the inlier-priority OD mechanism. The approach is theoretically grounded and empirically validated across 47 real-world datasets and four deep UOD models, significantly improving AutoEncoder (AE) performance and outperforming state-of-the-art baselines. GradStop demonstrates robust, generalizable gains and offers a practical solution to prevent toxicity during UOD training, with potential extensions to integrate the metrics into optimization and to leverage pseudo-labeling for weak supervision.

Abstract

Unsupervised Outlier Detection (UOD) is a critical task in data mining and machine learning, aiming to identify instances that significantly deviate from the majority. Without any label, deep UOD methods struggle with the misalignment between the model's direct optimization goal and the final performance goal of Outlier Detection (OD) task. Through the perspective of training dynamics, this paper proposes an early stopping algorithm to optimize the training of deep UOD models, ensuring they perform optimally in OD rather than overfitting the entire contaminated dataset. Inspired by UOD mechanism and inlier priority phenomenon, where intuitively models fit inliers more quickly than outliers, we propose GradStop, a sampling-based label-free algorithm to estimate model's real-time performance during training. First, a sampling method generates two sets: one likely containing more outliers and the other more inliers, then a metric based on gradient cohesion is applied to probe into current training dynamics, which reflects model's performance on OD task. Experimental results on 4 deep UOD algorithms and 47 real-world datasets and theoretical proofs demonstrate the effectiveness of our proposed early stopping algorithm in enhancing the performance of deep UOD models. Auto Encoder (AE) enhanced by GradStop achieves better performance than itself, other SOTA UOD methods, and even ensemble AEs. Our method provides a robust and effective solution to the problem of performance degradation during training, enabling deep UOD models to achieve better potential in anomaly detection tasks.

GradStop: Exploring Training Dynamics in Unsupervised Outlier Detection through Gradient

TL;DR

This work tackles unsupervised outlier detection (UOD) in contaminated datasets, where label-free evaluation makes training difficult due to misalignment between optimization and OD goals. It introduces GradStop, a gradient-based, label-free early-stopping method that uses GradSample to form two gradient sets and computes cohesion and divergence to monitor training dynamics relative to the inlier-priority OD mechanism. The approach is theoretically grounded and empirically validated across 47 real-world datasets and four deep UOD models, significantly improving AutoEncoder (AE) performance and outperforming state-of-the-art baselines. GradStop demonstrates robust, generalizable gains and offers a practical solution to prevent toxicity during UOD training, with potential extensions to integrate the metrics into optimization and to leverage pseudo-labeling for weak supervision.

Abstract

Unsupervised Outlier Detection (UOD) is a critical task in data mining and machine learning, aiming to identify instances that significantly deviate from the majority. Without any label, deep UOD methods struggle with the misalignment between the model's direct optimization goal and the final performance goal of Outlier Detection (OD) task. Through the perspective of training dynamics, this paper proposes an early stopping algorithm to optimize the training of deep UOD models, ensuring they perform optimally in OD rather than overfitting the entire contaminated dataset. Inspired by UOD mechanism and inlier priority phenomenon, where intuitively models fit inliers more quickly than outliers, we propose GradStop, a sampling-based label-free algorithm to estimate model's real-time performance during training. First, a sampling method generates two sets: one likely containing more outliers and the other more inliers, then a metric based on gradient cohesion is applied to probe into current training dynamics, which reflects model's performance on OD task. Experimental results on 4 deep UOD algorithms and 47 real-world datasets and theoretical proofs demonstrate the effectiveness of our proposed early stopping algorithm in enhancing the performance of deep UOD models. Auto Encoder (AE) enhanced by GradStop achieves better performance than itself, other SOTA UOD methods, and even ensemble AEs. Our method provides a robust and effective solution to the problem of performance degradation during training, enabling deep UOD models to achieve better potential in anomaly detection tasks.

Paper Structure

This paper contains 32 sections, 1 theorem, 23 equations, 10 figures, 10 tables, 2 algorithms.

Key Result

Theorem 4.1

With certain assumptions and $r_t > \cos{\theta_t}R + \sqrt{\cos^2{\theta_t}R^2 + 2R + 1}$, we have loss decreasing speed gap $\tilde{\triangle}^f_t>0$, which means inlier priority strengthens at epoch $t$.

Figures (10)

  • Figure 1: Performance degradation: UOD training process of AutoEncoder on clean dataset shuttle and original polluted shuttle.
  • Figure 2: Various performance trends: UOD training process of AutoEncoder and DeepSVDD on dataset vowels.
  • Figure 3: Training dynamics of AE training on dataset cover, in which outlier proportion is $0.96\%$. Dark green denotes the inliers, and orange denotes the outliers.
  • Figure 4: Case studies on datasets glass, wine, and optdigits, showing close correspondence between the $\mathbf{C^\Delta}$ and the variation trend of AUC. Top: AUC. Middle: $\mathbf{C}(G^{\text{last}})$ and $\mathbf{C}(G^{\text{top}})$. Bottom: $C^{\Delta}=\mathbf{C}(G^{\text{last}})-\mathbf{C}(G^{\text{top}})$.
  • Figure 5: AE: AUC curves vs. $\mathbf{C^\Delta}$ curves. Top: AUC. Middle: $\mathbf{C}(G^{\text{last}})$ and $\mathbf{C}(G^{\text{top}})$. Bottom: $C^{\Delta}=\mathbf{C}(G^{\text{last}})-\mathbf{C}(G^{\text{top}})$.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Theorem 4.1
  • proof