Label-Informed Outlier Detection Based on Granule Density

Baiyang Chen; Zhong Yuan; Dezhong Peng; Hongmei Chen; Xiaomin Song; Huiming Zheng

Label-Informed Outlier Detection Based on Granule Density

Baiyang Chen, Zhong Yuan, Dezhong Peng, Hongmei Chen, Xiaomin Song, Huiming Zheng

TL;DR

The paper tackles outlier detection in heterogeneous data with limited labeled examples. It introduces Granule Density-based Outlier Factor (GDOF), a label-informed framework that uses fuzzy granulation and granule density to model uncertainty and diverse data types. Attribute relevance learned from labels is aggregated across attributes to produce per-object outlier scores, enabling robust detection with few labeled outliers. Experiments on 20 real-world datasets and public code demonstrate competitive performance across data types and parameter settings, highlighting practical utility for complex data scenarios.

Abstract

Outlier detection, crucial for identifying unusual patterns with significant implications across numerous applications, has drawn considerable research interest. Existing semi-supervised methods typically treat data as purely numerical and} in a deterministic manner, thereby neglecting the heterogeneity and uncertainty inherent in complex, real-world datasets. This paper introduces a label-informed outlier detection method for heterogeneous data based on Granular Computing and Fuzzy Sets, namely Granule Density-based Outlier Factor (GDOF). Specifically, GDOF first employs label-informed fuzzy granulation to effectively represent various data types and develops granule density for precise density estimation. Subsequently, granule densities from individual attributes are integrated for outlier scoring by assessing attribute relevance with a limited number of labeled outliers. Experimental results on various real-world datasets show that GDOF stands out in detecting outliers in heterogeneous data with a minimal number of labeled outliers. The integration of Fuzzy Sets and Granular Computing in GDOF offers a practical framework for outlier detection in complex and diverse data types. All relevant datasets and source codes are publicly available for further research. This is the author's accepted manuscript of a paper published in IEEE Transactions on Fuzzy Systems. The final version is available at https://doi.org/10.1109/TFUZZ.2024.3514853

Label-Informed Outlier Detection Based on Granule Density

TL;DR

Abstract

Label-Informed Outlier Detection Based on Granule Density

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (15)