MSS-PAE: Saving Autoencoder-based Outlier Detection from Unexpected Reconstruction
Xu Tan, Jiawei Yang, Junqi Chen, Sylwan Rahardja, Susanto Rahardja
TL;DR
This work tackles unexpected reconstruction and overconfidence in autoencoder-based outlier detection by (1) introducing a probabilistic autoencoder (PAE) that models per-dimension uncertainty with $\bm{\mu}$ and $\bm{\sigma}^2$ and by (2) weighting the reconstruction loss with Weighted Negative Log Likelihood (WNLL). To address the neglect of local data structure, it proposes Mean-Shift Scoring (MSS), which uses mean-shifted inputs $\mathbf{x}^{MS}(m,k)$ to compute more robust outlier scores via MSS-MSE or MSS-WNLL. Experiments on 32 real-world tabular OD datasets show that WNLL substantially improves detection performance over standard MSE-based AE, and MSS further boosts robustness by reducing false inliers, with MSS-PAE achieving the best overall results and outperforming eight non-AE baselines by a large margin. The results demonstrate that combining uncertainty-aware reconstruction with local-structure information yields practical, transferable improvements for OD in real-world settings.
Abstract
AutoEncoders (AEs) are commonly used for machine learning tasks due to their intrinsic learning ability. This unique characteristic can be capitalized for Outlier Detection (OD). However conventional AE-based methods face the issue of overconfident decisions and unexpected reconstruction results of outliers, limiting their performance in OD. To mitigate these issues, the Mean Squared Error (MSE) and Negative Logarithmic Likelihood (NLL) were firstly analyzed, and the importance of incorporating aleatoric uncertainty to AE-based OD was elucidated. Then the Weighted Negative Logarithmic Likelihood (WNLL) was proposed to adjust for the effect of uncertainty for different OD scenarios. Moreover, the Mean-Shift Scoring (MSS) method was proposed to utilize the local relationship of data to reduce the issue of false inliers caused by AE. Experiments on 32 real-world OD datasets proved the effectiveness of the proposed methods. The combination of WNLL and MSS achieved 41% relative performance improvement compared to the best baseline. In addition, MSS improved the detection performance of multiple AE-based outlier detectors by an average of 20%. The proposed methods have the potential to advance AE's development in OD.
