Z-Dip: a validated generalization of the Dip Test
Edoardo Di Martino, Matteo Cinelli, Roy Cerqueti
TL;DR
The Z-Dip generalizes Hartigan’s Dip Test by standardizing the Dip statistic against its null distribution, yielding a scale-free multimodality score $Z\text{-}Dip = z = \frac{Dip_{obs}-\mu_N}{\sigma_N}$ with a universal threshold $z \approx 1.975$ calibrated via simulation and bootstrap. This removes sample-size dependence, preserves the original test’s nonparametric nature and $O(n \log n)$ complexity, and provides lookup tables for rapid evaluation. The approach is validated on synthetic Gaussian mixtures and $117{,}457$ real-world opinion distributions, showing close agreement with Dip-based decisions and robust performance across $N$, with a downsampling correction to mitigate large-sample sensitivity. The result is a practical, interpretable, and scalable tool for detecting and quantifying multimodality in diverse datasets, accompanied by open-source implementations. The work thereby facilitates consistent multimodality analysis across studies and applications where sample sizes vary substantially.
Abstract
Detecting multimodality in empirical distributions is a fundamental problem in statistics and data analysis, with applications ranging from clustering to social science. Hartigan's Dip Test is a classical nonparametric procedure for testing unimodality versus multimodality, but its interpretation is hindered by strong dependence on sample size and the need for lookup tables. We introduce the Z-Dip, a standardized extension of the Dip Test that removes sample-size dependence by comparing observed Dip values to simulated null distributions. We calibrate a universal decision threshold for Z-Dip via simulation and bootstrap resampling, providing a unified criterion for multimodality detection. In the final section, we also propose a downsampling-based approach to further mitigate residual sample-size effects in very large datasets. Lookup tables and software implementations are made available for efficient use in practice.
