Table of Contents
Fetching ...

Fuzzy Granule Density-Based Outlier Detection with Multi-Scale Granular Balls

Can Gao, Xiaofeng Tan, Jie Zhou, Weiping Ding, Witold Pedrycz

TL;DR

This work tackles the challenge of detecting diverse outliers by turning unsupervised outlier detection into a semi-supervised classification problem. It introduces a novel framework that fuses fuzzy rough sets with relative granule density and multi-scale granular-ball views, using a three-way decision and a weighted SVM to produce a refined outlier probability vector $\widetilde{P}$. The approach combines a density-enhanced fuzzy similarity (via $Den_a$, $Rel_Den_a$, and $\tilde{R}_a$) with multi-scale representations to identify local, global, and group outliers, and demonstrates significant improvements over state-of-the-art methods on artificial and UCI datasets, with AUROC gains of at least $8.48\%$. The multi-scale fusion, underpinned by a rigorous statistical significance analysis, provides a practical, adaptable tool for robust anomaly detection across heterogeneous data, with potential extensions to semi-supervised settings and refined fusion strategies.

Abstract

Outlier detection refers to the identification of anomalous samples that deviate significantly from the distribution of normal data and has been extensively studied and used in a variety of practical tasks. However, most unsupervised outlier detection methods are carefully designed to detect specified outliers, while real-world data may be entangled with different types of outliers. In this study, we propose a fuzzy rough sets-based multi-scale outlier detection method to identify various types of outliers. Specifically, a novel fuzzy rough sets-based method that integrates relative fuzzy granule density is first introduced to improve the capability of detecting local outliers. Then, a multi-scale view generation method based on granular-ball computing is proposed to collaboratively identify group outliers at different levels of granularity. Moreover, reliable outliers and inliers determined by the three-way decision are used to train a weighted support vector machine to further improve the performance of outlier detection. The proposed method innovatively transforms unsupervised outlier detection into a semi-supervised classification problem and for the first time explores the fuzzy rough sets-based outlier detection from the perspective of multi-scale granular balls, allowing for high adaptability to different types of outliers. Extensive experiments carried out on both artificial and UCI datasets demonstrate that the proposed outlier detection method significantly outperforms the state-of-the-art methods, improving the results by at least 8.48% in terms of the Area Under the ROC Curve (AUROC) index. { The source codes are released at \url{https://github.com/Xiaofeng-Tan/MGBOD}. }

Fuzzy Granule Density-Based Outlier Detection with Multi-Scale Granular Balls

TL;DR

This work tackles the challenge of detecting diverse outliers by turning unsupervised outlier detection into a semi-supervised classification problem. It introduces a novel framework that fuses fuzzy rough sets with relative granule density and multi-scale granular-ball views, using a three-way decision and a weighted SVM to produce a refined outlier probability vector . The approach combines a density-enhanced fuzzy similarity (via , , and ) with multi-scale representations to identify local, global, and group outliers, and demonstrates significant improvements over state-of-the-art methods on artificial and UCI datasets, with AUROC gains of at least . The multi-scale fusion, underpinned by a rigorous statistical significance analysis, provides a practical, adaptable tool for robust anomaly detection across heterogeneous data, with potential extensions to semi-supervised settings and refined fusion strategies.

Abstract

Outlier detection refers to the identification of anomalous samples that deviate significantly from the distribution of normal data and has been extensively studied and used in a variety of practical tasks. However, most unsupervised outlier detection methods are carefully designed to detect specified outliers, while real-world data may be entangled with different types of outliers. In this study, we propose a fuzzy rough sets-based multi-scale outlier detection method to identify various types of outliers. Specifically, a novel fuzzy rough sets-based method that integrates relative fuzzy granule density is first introduced to improve the capability of detecting local outliers. Then, a multi-scale view generation method based on granular-ball computing is proposed to collaboratively identify group outliers at different levels of granularity. Moreover, reliable outliers and inliers determined by the three-way decision are used to train a weighted support vector machine to further improve the performance of outlier detection. The proposed method innovatively transforms unsupervised outlier detection into a semi-supervised classification problem and for the first time explores the fuzzy rough sets-based outlier detection from the perspective of multi-scale granular balls, allowing for high adaptability to different types of outliers. Extensive experiments carried out on both artificial and UCI datasets demonstrate that the proposed outlier detection method significantly outperforms the state-of-the-art methods, improving the results by at least 8.48% in terms of the Area Under the ROC Curve (AUROC) index. { The source codes are released at \url{https://github.com/Xiaofeng-Tan/MGBOD}. }
Paper Structure (19 sections, 37 equations, 9 figures, 6 tables, 3 algorithms)

This paper contains 19 sections, 37 equations, 9 figures, 6 tables, 3 algorithms.

Figures (9)

  • Figure 1: Framework of multi-scale outlier detection with granular balls.
  • Figure 2: The results of local outlier detection on the synthetic dataset. a) The results of outlier detection without density information; b) The results of outlier detection using density information (visualized in 3-dimensional space).
  • Figure 3: Multi-scale view generation with granular ball computing on a synthetic dataset. a) the original view and outlier detection results; b) the 2nd scale view; c) the 3rd scale view; d) the 4th scale view; e) the last scale view; (f) granular ball updating. In the process of multi-scale view generation, the granular balls $GB_1^i$, $GB_2^i$, and $GB_3^i$ in the $i$-th scale view are merged into a coarser granular ball $GB_1^{i+1}$ for the $(i+1)$-th scale view.
  • Figure 4: The synthesized datasets visualized in 3-dimensional PCA space. a) Local outliers; b) Global outliers; c) Group outliers.
  • Figure 5: The Friedman test diagram.
  • ...and 4 more figures