Table of Contents
Fetching ...

Outlier detection in mixed-attribute data: a semi-supervised approach with fuzzy approximations and relative entropy

Baiyang Chen, Zhong Yuan, Zheng Liu, Dezhong Peng, Yongxiang Li, Chang Liu, Guiduo Duan

TL;DR

This paper tackles outlier detection in mixed-attribute data under limited supervision by introducing FROD, a semi-supervised framework based on fuzzy rough sets. It combines attribute-level contribution via attribute classification accuracy with a fuzzy relative entropy measure to quantify uncertainty-driven outlierness from unlabeled data. The method achieves competitive or superior performance on 16 public datasets, particularly excelling with nominal attributes and demonstrating data efficiency with minimal labeled data. The approach offers a principled way to leverage both uncertainty modeling and labeled information for robust outlier detection in heterogeneous data environments.

Abstract

Outlier detection is a critical task in data mining, aimed at identifying objects that significantly deviate from the norm. Semi-supervised methods improve detection performance by leveraging partially labeled data but typically overlook the uncertainty and heterogeneity of real-world mixed-attribute data. This paper introduces a semi-supervised outlier detection method, namely fuzzy rough sets-based outlier detection (FROD), to effectively handle these challenges. Specifically, we first utilize a small subset of labeled data to construct fuzzy decision systems, through which we introduce the attribute classification accuracy based on fuzzy approximations to evaluate the contribution of attribute sets in outlier detection. Unlabeled data is then used to compute fuzzy relative entropy, which provides a characterization of outliers from the perspective of uncertainty. Finally, we develop the detection algorithm by combining attribute classification accuracy with fuzzy relative entropy. Experimental results on 16 public datasets show that FROD is comparable with or better than leading detection algorithms. All datasets and source codes are accessible at https://github.com/ChenBaiyang/FROD. This manuscript is the accepted author version of a paper published by Elsevier. The final published version is available at https://doi.org/10.1016/j.ijar.2025.109373

Outlier detection in mixed-attribute data: a semi-supervised approach with fuzzy approximations and relative entropy

TL;DR

This paper tackles outlier detection in mixed-attribute data under limited supervision by introducing FROD, a semi-supervised framework based on fuzzy rough sets. It combines attribute-level contribution via attribute classification accuracy with a fuzzy relative entropy measure to quantify uncertainty-driven outlierness from unlabeled data. The method achieves competitive or superior performance on 16 public datasets, particularly excelling with nominal attributes and demonstrating data efficiency with minimal labeled data. The approach offers a principled way to leverage both uncertainty modeling and labeled information for robust outlier detection in heterogeneous data environments.

Abstract

Outlier detection is a critical task in data mining, aimed at identifying objects that significantly deviate from the norm. Semi-supervised methods improve detection performance by leveraging partially labeled data but typically overlook the uncertainty and heterogeneity of real-world mixed-attribute data. This paper introduces a semi-supervised outlier detection method, namely fuzzy rough sets-based outlier detection (FROD), to effectively handle these challenges. Specifically, we first utilize a small subset of labeled data to construct fuzzy decision systems, through which we introduce the attribute classification accuracy based on fuzzy approximations to evaluate the contribution of attribute sets in outlier detection. Unlabeled data is then used to compute fuzzy relative entropy, which provides a characterization of outliers from the perspective of uncertainty. Finally, we develop the detection algorithm by combining attribute classification accuracy with fuzzy relative entropy. Experimental results on 16 public datasets show that FROD is comparable with or better than leading detection algorithms. All datasets and source codes are accessible at https://github.com/ChenBaiyang/FROD. This manuscript is the accepted author version of a paper published by Elsevier. The final published version is available at https://doi.org/10.1016/j.ijar.2025.109373

Paper Structure

This paper contains 25 sections, 1 theorem, 20 equations, 8 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

Let $O=O_1\cup \{o_k\}$ be the set of objects with the attributes $B$ such that $\forall o_i, o_j \in O_1$, $d_{ij}\leq \delta$ and $d_{ik}> \delta$. It holds that $FRE(o_i) > FRE(o_k)$.

Figures (8)

  • Figure 1: Overall structure of FROD. The method utilizes unlabeled data to calculate outlier factors based on fuzzy relative entropy. Then, a minimal number of labeled data is used to builed fuzzy decision systems, which enable to assess the contribution of attributes through attribute classification accuracy. Objects with consistently high outlier factors and significant attribute classification accuracy are identified as outliers.
  • Figure 2: Illustration of attribute classification accuracy with respect to the attribute set $B$. The orange line indicates the membership degree of fuzzy upper approximation for each object, and the blue line denotes that of fuzzy lower approximation for each object. The weighted propotion of blue area and orange area for negative objects and positive objects represents the attribute classification accuracy of $B$.
  • Figure 3: An example with 100 normal points and 10 outliers: (a) 2D plot of the dataset. (b) Fuzzy relative entropy for each object.
  • Figure 4: Average AUC (%) on datasets with and without nominal attributes (@1% labeled data)
  • Figure 5: Average AP (%) on datasets with and without nominal attributes (@1% labeled data)
  • ...and 3 more figures

Theorems & Definitions (12)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Definition 7
  • Theorem 1
  • proof
  • Definition 8
  • ...and 2 more