Table of Contents
Fetching ...

Hierarchical Reference Sets for Robust Unsupervised Detection of Scattered and Clustered Outliers

Yiqun Zhang, Zexi Tan, Xiaopeng Luo, Yunlin Liu

Abstract

Most real-world IoT data analysis tasks, such as clustering and anomaly event detection, are unsupervised and highly susceptible to the presence of outliers. In addition to sporadic scattered outliers caused by factors such as faulty sensor readings, IoT systems often exhibit clustered outliers. These occur when multiple devices or nodes produce similar anomalous measurements, for instance, owing to localized interference, emerging security threats, or regional false alarms, forming micro-clusters. These clustered outliers can be easily mistaken for normal behavior because of their relatively high local density, thereby obscuring the detection of both scattered and contextual anomalies. To address this, we propose a novel outlier detection paradigm that leverages the natural neighboring relationships using graph structures. This facilitates multi-perspective anomaly evaluation by incorporating reference sets at both local and global scales derived from the graph. Our approach enables the effective recognition of scattered outliers without interference from clustered anomalies, whereas the graph structure simultaneously helps reflect and isolate clustered outlier groups. Extensive experiments, including comparative performance analysis, ablation studies, validation on downstream clustering tasks, and evaluation of hyperparameter sensitivity, demonstrate the efficacy of the proposed method. The source code is available at https://github.com/gordonlok/DROD.

Hierarchical Reference Sets for Robust Unsupervised Detection of Scattered and Clustered Outliers

Abstract

Most real-world IoT data analysis tasks, such as clustering and anomaly event detection, are unsupervised and highly susceptible to the presence of outliers. In addition to sporadic scattered outliers caused by factors such as faulty sensor readings, IoT systems often exhibit clustered outliers. These occur when multiple devices or nodes produce similar anomalous measurements, for instance, owing to localized interference, emerging security threats, or regional false alarms, forming micro-clusters. These clustered outliers can be easily mistaken for normal behavior because of their relatively high local density, thereby obscuring the detection of both scattered and contextual anomalies. To address this, we propose a novel outlier detection paradigm that leverages the natural neighboring relationships using graph structures. This facilitates multi-perspective anomaly evaluation by incorporating reference sets at both local and global scales derived from the graph. Our approach enables the effective recognition of scattered outliers without interference from clustered anomalies, whereas the graph structure simultaneously helps reflect and isolate clustered outlier groups. Extensive experiments, including comparative performance analysis, ablation studies, validation on downstream clustering tasks, and evaluation of hyperparameter sensitivity, demonstrate the efficacy of the proposed method. The source code is available at https://github.com/gordonlok/DROD.
Paper Structure (29 sections, 9 equations, 9 figures, 12 tables, 3 algorithms)

This paper contains 29 sections, 9 equations, 9 figures, 12 tables, 3 algorithms.

Figures (9)

  • Figure 1: Illustration of scatterliers (gray cross), clusterliers (red square), collective outliers (red cross) and the masking effect in outlier detection (red dotted arrow) of a dataset with normal clustered samples (dotted circle).
  • Figure 2: Pipeline of the proposed method. Natural neighbor subsets containing natural neighboring samples are constructed based on the NB definition in Section \ref{['sct:pre']}. Then, the Subset Anomaly Index (SAI) and Local Anomaly Index (LAI) are computed to comprehensively reflect the sample abnormality under the complex co-occurrence of scatterliers and clusterliers. SAI is derived from the degree of adjacency (i.e., link strength) among subsets, indicating the subset-level abnormality of samples. LAI measures the local abnormality of samples within each subset based on distribution density. These two indices collectively enable the detection of outliers without bias and clusterliers' masking effect.
  • Figure 3: Two basic synthetic datasets C1 and C2 for generating various types of outliers.
  • Figure 4: Visualization of synthetic datasets D1-D12.
  • Figure 5: AUC comparison on D1-D12. D1 and D2 are with only clusterliers. The ten datasets D3-D12 are with both scatterliers and clusterliers.
  • ...and 4 more figures

Theorems & Definitions (5)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • proof