Density estimation via mixture discrepancy and moments
Zhengyang Lei, Lirong Qu, Sihong Shao, Yunfeng Xiong
TL;DR
This work addresses high-dimensional density estimation by replacing the NP-hard star discrepancy in Discrepancy-based Sequential Partition (DSP) with more tractable measures: the mixture discrepancy (DSP-mix) and moment-based tests (MSP). Both approaches preserve reflection and rotation invariance and maintain a tree-based, adaptive partitioning framework to learn piecewise-constant densities. Empirical results on Beta, Gaussian, and Cauchy mixtures up to 30 dimensions show MSP achieves comparable accuracy to DSP with substantial speedups (2–20x at large N), while DSP-mix delivers strong performance in low dimensions. The findings offer scalable, invariant density estimation techniques suitable for moderately high-dimensional data, with guidance on partition depth and potential directions for further analysis.
Abstract
With the aim of generalizing histogram statistics to higher dimensional cases, density estimation via discrepancy based sequential partition (DSP) has been proposed to learn an adaptive piecewise constant approximation defined on a binary sequential partition of the underlying domain, where the star discrepancy is adopted to measure the uniformity of particle distribution. However, the calculation of the star discrepancy is NP-hard and it does not satisfy the reflection invariance and rotation invariance either. To this end, we use the mixture discrepancy and the comparison of moments as a replacement of the star discrepancy, leading to the density estimation via mixture discrepancy based sequential partition (DSP-mix) and density estimation via moment-based sequential partition (MSP), respectively. Both DSP-mix and MSP are computationally tractable and exhibit the reflection and rotation invariance. Numerical experiments in reconstructing Beta mixtures, Gaussian mixtures and heavy-tailed Cauchy mixtures up to 30 dimension are conducted, demonstrating that MSP can maintain the same accuracy compared with DSP, while gaining an increase in speed by a factor of two to twenty for large sample size, and DSP-mix can achieve satisfactory accuracy and boost the efficiency in low-dimensional tests ($d \le 6$), but might lose accuracy in high-dimensional problems due to a reduction in partition level.
