Dimensionality-Aware Outlier Detection: Theoretical and Experimental Analysis
Alastair Anderberg, James Bailey, Ricardo J. G. B. Campello, Michael E. Houle, Henrique O. Marques, Miloš Radovanović, Arthur Zimek
TL;DR
The paper tackles outlier detection under varying local intrinsic dimensionality by introducing a nonparametric, dimensionality-aware scorer called DAO. DAO is grounded in Local Intrinsic Dimensionality (LID) theory and the asymptotic local density ratio (ALDR), enabling it to adapt to local dataset geometry when assessing outlierness. Empirical results across more than 800 synthetic and real datasets show that DAO outperforms traditional baselines such as LOF, SLOF, and kNN, particularly when LID varies significantly within the data; the work also evaluates how LID estimator choice impacts performance. The findings suggest that incorporating local dimensionality leads to more robust and effective outlier detection in complex, high-dimensional settings, with a public codebase to support replication and further research.
Abstract
We present a nonparametric method for outlier detection that takes full account of local variations in intrinsic dimensionality within the dataset. Using the theory of Local Intrinsic Dimensionality (LID), our 'dimensionality-aware' outlier detection method, DAO, is derived as an estimator of an asymptotic local expected density ratio involving the query point and a close neighbor drawn at random. The dimensionality-aware behavior of DAO is due to its use of local estimation of LID values in a theoretically-justified way. Through comprehensive experimentation on more than 800 synthetic and real datasets, we show that DAO significantly outperforms three popular and important benchmark outlier detection methods: Local Outlier Factor (LOF), Simplified LOF, and kNN.
