A Multivariate Unimodality Test Harnessing the Dip Statistic of Mahalanobis Distances Over Random Projections
Prodromos Kolyvakis, Aristidis Likas
TL;DR
This work tackles the challenge of testing unimodality in multidimensional data by introducing mud-pod, a multivariate unimodality test for distributions in $ℝ^d$ under the $α$-unimodality family. The method uses Mahalanobis distances from random observer points, random projections that preserve distances via the Johnson–Lindenstrauss lemma, and applies the univariate dip test on multiple views, combined via Monte Carlo to decide $H_0: X ∼ P_{α}$. It provides a mathematical foundation (Decomposition theorem, translation/norm/projection properties) and demonstrates consistency, with empirical validation on synthetic and real-world datasets; mp-means further demonstrates automatic cluster-count estimation with competitive performance against standard clustering methods. The results highlight the benefits of the RP space, percentile-based observer selection, and the Mahalanobis distance in enhancing unimodality detection and clustering robustness, supporting practical applicability across diverse data domains.
Abstract
Unimodality, pivotal in statistical analysis, offers insights into dataset structures and drives sophisticated analytical procedures. While unimodality's confirmation is straightforward for one-dimensional data using methods like Silverman's approach and Hartigans' dip statistic, its generalization to higher dimensions remains challenging. By extrapolating one-dimensional unimodality principles to multi-dimensional spaces through linear random projections and leveraging point-to-point distancing, our method, rooted in $α$-unimodality assumptions, presents a novel multivariate unimodality test named mud-pod. Both theoretical and empirical studies confirm the efficacy of our method in unimodality assessment of multidimensional datasets as well as in estimating the number of clusters.
