Robust Statistical Scaling of Outlier Scores: Improving the Quality of Outlier Probabilities for Outliers (Extended Version)

Philipp Röchner; Henrique O. Marques; Ricardo J. G. B. Campello; Arthur Zimek; Franz Rothlauf

Robust Statistical Scaling of Outlier Scores: Improving the Quality of Outlier Probabilities for Outliers (Extended Version)

Philipp Röchner, Henrique O. Marques, Ricardo J. G. B. Campello, Arthur Zimek, Franz Rothlauf

TL;DR

This paper proposes robust statistical scaling, which uses robust estimators to improve the probabilities for outliers and evaluates several variants of this method against other outlier score transformations for real-world datasets and outlier detection algorithms, where it can improve the probabilities for outliers.

Abstract

Outlier detection algorithms typically assign an outlier score to each observation in a dataset, indicating the degree to which an observation is an outlier. However, these scores are often not comparable across algorithms and can be difficult for humans to interpret. Statistical scaling addresses this problem by transforming outlier scores into outlier probabilities without using ground-truth labels, thereby improving interpretability and comparability across algorithms. However, the quality of this transformation can be different for outliers and inliers. Missing outliers in scenarios where they are of particular interest - such as healthcare, finance, or engineering - can be costly or dangerous. Thus, ensuring good probabilities for outliers is essential. This paper argues that statistical scaling, as commonly used in the literature, does not produce equally good probabilities for outliers as for inliers. Therefore, we propose robust statistical scaling, which uses robust estimators to improve the probabilities for outliers. We evaluate several variants of our method against other outlier score transformations for real-world datasets and outlier detection algorithms, where it can improve the probabilities for outliers.

Robust Statistical Scaling of Outlier Scores: Improving the Quality of Outlier Probabilities for Outliers (Extended Version)

TL;DR

Abstract

Paper Structure (16 sections, 6 equations, 4 figures)

This paper contains 16 sections, 6 equations, 4 figures.

Introduction
Related Work
Problem Statement
Background: Non-robust Statistical Scaling of Outlier Scores
Robust Statistical Scaling of Outlier Scores
Experiments
Datasets and Outlier Detection Algorithms
Outlier Score Transformations
Evaluation of Outlier Score Transformations
Results
Are the outlier probabilities computed by non-robust Gaussian scaling similarly good for outliers and inliers?
Does robust Gaussian scaling improve the probabilities of outliers compared to non-robust Gaussian scaling?
Which Gaussian scaling variant best balances the overall quality of the probabilities for outliers and inliers?
Conclusions
Acknowledgments.
...and 1 more sections

Figures (4)

Figure 1: Transformation of outlier scores to outlier probabilities using non-robust and robust Gaussian scaling. The $k$-Nearest Neighbors Detector ramaswamy2000efficient on the Ionosphere dataset campos2016evaluation computes the outlier scores (Figure \ref{['fig:outlier_scores']}). Gaussian scaling kriegel2011interpreting and robust Gaussian scaling (Figure \ref{['fig:transformation']}) transform the outlier scores into outlier probabilities (Figure \ref{['fig:outlier_probabilities']}). We evaluate the outlier probabilities of both transformations separately for outliers and inliers (Figure \ref{['fig:residuals']}). In Figure \ref{['fig:outlier_scores']}, the robust Gaussian density function better approximates the outlier scores of the inliers than the non-robust Gaussian density function. As a result, robust Gaussian scaling reflects the proportion of outliers and inliers better, as shown in Figure \ref{['fig:transformation']}. In Figure \ref{['fig:outlier_probabilities']}, robust Gaussian scaling correctly pushes the outlier probabilities of the outliers to one. Finally, the outlier probabilities of robust Gaussian scaling have lower residuals for the outliers while slightly increasing the residuals for the inliers (see Figure \ref{['fig:residuals']}).
Figure 2: Skill scores $\mathop{\mathrm{\textrm{MSS}}}\nolimits(\mathop{\mathrm{\textrm{GS}}}\nolimits^{\mathit{outlier}},\mathop{\mathrm{\textrm{GS}}}\nolimits^{\mathit{inlier}})$ of the probabilities computed by non-robust Gaussian scaling for outliers $\mathop{\mathrm{\textrm{GS}}}\nolimits^{\mathit{outlier}}$ compared to the probabilities for inliers $\mathop{\mathrm{\textrm{GS}}}\nolimits^{\mathit{inlier}}$. A positive skill score indicates better, a skill score of zero indicates equal, and a negative skill score indicates inferior outlier probabilities $\mathop{\mathrm{\textrm{GS}}}\nolimits^{\mathit{outlier}}$ compared to $\mathop{\mathrm{\textrm{GS}}}\nolimits^{\mathit{inlier}}$. Overall, non-robust Gaussian scaling computes inferior probabilities for the outliers than for the inliers for all four measures examined.
Figure 3: Skill scores $\mathop{\mathrm{\textrm{MSS}}}\nolimits(\mathop{\mathrm{\textrm{T}}}\nolimits^{\mathit{outlier}},\mathop{\mathrm{\textrm{GS}}}\nolimits^{\mathit{outlier}})$ of probabilities for outliers computed by outlier score transformations $\mathop{\mathrm{\textrm{T}}}\nolimits^{\mathit{outlier}}$, which are linear scaling and variants of robust Gaussian scaling, compared to probabilities for outliers computed by non-robust Gaussian scaling $\mathop{\mathrm{\textrm{GS}}}\nolimits^{\mathit{outlier}}$. A positive skill score indicates better, a skill score of zero indicates equal, and a negative skill score indicates inferior outlier probabilities $\mathop{\mathrm{\textrm{T}}}\nolimits^{\mathit{outlier}}$ compared to $\mathop{\mathrm{\textrm{GS}}}\nolimits^{\mathit{outlier}}$. Overall, all variants of robust Gaussian scaling improve the Brier score (Figure \ref{['fig:improvement_stratified_brier_score_outliers']}) for outliers compared to non-robust Gaussian scaling. Similarly, most variants of robust Gaussian scaling improve the sharpness (Figure \ref{['fig:improvement_stratified_sharpness_error_outliers']}) and refinement errors for outliers (Figure \ref{['fig:improvement_stratified_refinement_error_outliers']}) compared to non-robust Gaussian scaling. For the calibration error for outliers (Figure \ref{['fig:improvement_stratified_calibration_error_outliers']}), the outlier probabilities of robust Gaussian scaling are inferior to the outlier probabilities of non-robust Gaussian scaling. For clarity, we do not display skill scores less (greater) than $1.5$ times the first (third) quartile.
Figure 4: Mean rank of linear scaling, non-robust, and robust Gaussian scaling variants for the harmonic improvement score of the stratified sharpness, refinement, and calibration errors for outliers and inliers: Gaussian scaling with sample mean as center and nMAD as scale performs best.

Theorems & Definitions (4)

definition thmcounterdefinition: Non-robust Gaussian Scaling
definition thmcounterdefinition: Robust Gaussian Scaling
definition thmcounterdefinition: Skill Score
definition thmcounterdefinition: Harmonic Improvement Score

Robust Statistical Scaling of Outlier Scores: Improving the Quality of Outlier Probabilities for Outliers (Extended Version)

TL;DR

Abstract

Robust Statistical Scaling of Outlier Scores: Improving the Quality of Outlier Probabilities for Outliers (Extended Version)

Authors

TL;DR

Abstract

Table of Contents

Figures (4)

Theorems & Definitions (4)