Table of Contents
Fetching ...

Outlyingness Scores with Cluster Catch Digraphs

Rui Shi, Elvan Ceyhan, Nedret Billor

TL;DR

The paper tackles the need for interpretable, per-observation outlyingness in high-dimensional data by introducing two cluster-graph–based scores, OOS and IOS, derived from Cluster Catch Digraphs. OOS compares a point's local density to that of its outbound neighbors, while IOS aggregates inbound neighborhood influence and inverts it to yield a robust outlyingness measure, with IOS standardized for cross-cluster comparisons. Across extensive Monte Carlo simulations and real-data experiments, IOS consistently outperforms OOS and existing CCD-based methods, especially in high dimensions and under masking, while OOS remains effective for global/local outliers. These OSS within the CCD framework offer a principled, interpretable, and robust approach to outlier detection with practical applicability to complex, high-dimensional datasets.

Abstract

This paper introduces two novel, outlyingness scores (OSs) based on Cluster Catch Digraphs (CCDs): Outbound Outlyingness Score (OOS) and Inbound Outlyingness Score (IOS). These scores enhance the interpretability of outlier detection results. Both OSs employ graph-, density-, and distribution-based techniques, tailored to high-dimensional data with varying cluster shapes and intensities. OOS evaluates the outlyingness of a point relative to its nearest neighbors, while IOS assesses the total ``influence" a point receives from others within its cluster. Both OSs effectively identify global and local outliers, invariant to data collinearity. Moreover, IOS is robust to the masking problems. With extensive Monte Carlo simulations, we compare the performance of both OSs with CCD-based, traditional, and state-of-the-art outlier detection methods. Both OSs exhibit substantial overall improvements over the CCD-based methods in both artificial and real-world data sets, particularly with IOS, which delivers the best overall performance among all the methods, especially in high-dimensional settings. Keywords: Outlier detection, Outlyingness score, Graph-based clustering, Cluster catch digraphs, High-dimensional data.

Outlyingness Scores with Cluster Catch Digraphs

TL;DR

The paper tackles the need for interpretable, per-observation outlyingness in high-dimensional data by introducing two cluster-graph–based scores, OOS and IOS, derived from Cluster Catch Digraphs. OOS compares a point's local density to that of its outbound neighbors, while IOS aggregates inbound neighborhood influence and inverts it to yield a robust outlyingness measure, with IOS standardized for cross-cluster comparisons. Across extensive Monte Carlo simulations and real-data experiments, IOS consistently outperforms OOS and existing CCD-based methods, especially in high dimensions and under masking, while OOS remains effective for global/local outliers. These OSS within the CCD framework offer a principled, interpretable, and robust approach to outlier detection with practical applicability to complex, high-dimensional datasets.

Abstract

This paper introduces two novel, outlyingness scores (OSs) based on Cluster Catch Digraphs (CCDs): Outbound Outlyingness Score (OOS) and Inbound Outlyingness Score (IOS). These scores enhance the interpretability of outlier detection results. Both OSs employ graph-, density-, and distribution-based techniques, tailored to high-dimensional data with varying cluster shapes and intensities. OOS evaluates the outlyingness of a point relative to its nearest neighbors, while IOS assesses the total ``influence" a point receives from others within its cluster. Both OSs effectively identify global and local outliers, invariant to data collinearity. Moreover, IOS is robust to the masking problems. With extensive Monte Carlo simulations, we compare the performance of both OSs with CCD-based, traditional, and state-of-the-art outlier detection methods. Both OSs exhibit substantial overall improvements over the CCD-based methods in both artificial and real-world data sets, particularly with IOS, which delivers the best overall performance among all the methods, especially in high-dimensional settings. Keywords: Outlier detection, Outlyingness score, Graph-based clustering, Cluster catch digraphs, High-dimensional data.
Paper Structure (15 sections, 10 equations, 7 figures, 16 tables)

This paper contains 15 sections, 10 equations, 7 figures, 16 tables.

Figures (7)

  • Figure 1: An example of OOS on an artificial dataset with UN-CCDs. Black points are regular points and red crosses are outliers.
  • Figure 2: An example of IOS with UN-CCDs on the same artificial dataset of Figure \ref{['fig:OOS1.1']}.
  • Figure 3: Some standardized IOS values (rounded to one decimal point) with UN-CCDs on the same artificial dataset as Figure \ref{['fig:OOS1.1']}.
  • Figure 4: The line plots of the TPRs and TNRs of all CCD-based OSs, under the simulation settings (with uniform clusters) elaborated in shi2024outlier.
  • Figure 5: The line plots of the BAs and $F_2$-scores of all CCD-based OSs, under the simulation settings (with uniform clusters) elaborated in our previous work shi2024outlier.
  • ...and 2 more figures

Theorems & Definitions (7)

  • Definition 1.1: Outbound Neighbors
  • Definition 1.2: Vicinity Density
  • Definition 1.3: Outbound Outlyingness Score (OOS)
  • Definition 1.4: Inbound Neighbors
  • Definition 1.5: Cumulative Influence
  • Definition 1.6: Inbound Outlyingness Score (IOS)
  • Definition 1.7