Table of Contents
Fetching ...

Density estimation via mixture discrepancy and moments

Zhengyang Lei, Lirong Qu, Sihong Shao, Yunfeng Xiong

TL;DR

This work addresses high-dimensional density estimation by replacing the NP-hard star discrepancy in Discrepancy-based Sequential Partition (DSP) with more tractable measures: the mixture discrepancy (DSP-mix) and moment-based tests (MSP). Both approaches preserve reflection and rotation invariance and maintain a tree-based, adaptive partitioning framework to learn piecewise-constant densities. Empirical results on Beta, Gaussian, and Cauchy mixtures up to 30 dimensions show MSP achieves comparable accuracy to DSP with substantial speedups (2–20x at large N), while DSP-mix delivers strong performance in low dimensions. The findings offer scalable, invariant density estimation techniques suitable for moderately high-dimensional data, with guidance on partition depth and potential directions for further analysis.

Abstract

With the aim of generalizing histogram statistics to higher dimensional cases, density estimation via discrepancy based sequential partition (DSP) has been proposed to learn an adaptive piecewise constant approximation defined on a binary sequential partition of the underlying domain, where the star discrepancy is adopted to measure the uniformity of particle distribution. However, the calculation of the star discrepancy is NP-hard and it does not satisfy the reflection invariance and rotation invariance either. To this end, we use the mixture discrepancy and the comparison of moments as a replacement of the star discrepancy, leading to the density estimation via mixture discrepancy based sequential partition (DSP-mix) and density estimation via moment-based sequential partition (MSP), respectively. Both DSP-mix and MSP are computationally tractable and exhibit the reflection and rotation invariance. Numerical experiments in reconstructing Beta mixtures, Gaussian mixtures and heavy-tailed Cauchy mixtures up to 30 dimension are conducted, demonstrating that MSP can maintain the same accuracy compared with DSP, while gaining an increase in speed by a factor of two to twenty for large sample size, and DSP-mix can achieve satisfactory accuracy and boost the efficiency in low-dimensional tests ($d \le 6$), but might lose accuracy in high-dimensional problems due to a reduction in partition level.

Density estimation via mixture discrepancy and moments

TL;DR

This work addresses high-dimensional density estimation by replacing the NP-hard star discrepancy in Discrepancy-based Sequential Partition (DSP) with more tractable measures: the mixture discrepancy (DSP-mix) and moment-based tests (MSP). Both approaches preserve reflection and rotation invariance and maintain a tree-based, adaptive partitioning framework to learn piecewise-constant densities. Empirical results on Beta, Gaussian, and Cauchy mixtures up to 30 dimensions show MSP achieves comparable accuracy to DSP with substantial speedups (2–20x at large N), while DSP-mix delivers strong performance in low dimensions. The findings offer scalable, invariant density estimation techniques suitable for moderately high-dimensional data, with guidance on partition depth and potential directions for further analysis.

Abstract

With the aim of generalizing histogram statistics to higher dimensional cases, density estimation via discrepancy based sequential partition (DSP) has been proposed to learn an adaptive piecewise constant approximation defined on a binary sequential partition of the underlying domain, where the star discrepancy is adopted to measure the uniformity of particle distribution. However, the calculation of the star discrepancy is NP-hard and it does not satisfy the reflection invariance and rotation invariance either. To this end, we use the mixture discrepancy and the comparison of moments as a replacement of the star discrepancy, leading to the density estimation via mixture discrepancy based sequential partition (DSP-mix) and density estimation via moment-based sequential partition (MSP), respectively. Both DSP-mix and MSP are computationally tractable and exhibit the reflection and rotation invariance. Numerical experiments in reconstructing Beta mixtures, Gaussian mixtures and heavy-tailed Cauchy mixtures up to 30 dimension are conducted, demonstrating that MSP can maintain the same accuracy compared with DSP, while gaining an increase in speed by a factor of two to twenty for large sample size, and DSP-mix can achieve satisfactory accuracy and boost the efficiency in low-dimensional tests (), but might lose accuracy in high-dimensional problems due to a reduction in partition level.

Paper Structure

This paper contains 11 sections, 2 theorems, 25 equations, 15 figures, 12 tables, 1 algorithm.

Key Result

Theorem 2.1

For the partition $\Omega = \cup_{l=1}^L \Omega_l$, suppose $D^*(\widetilde{S}_l)\leq \theta\sqrt{N}/n_l$ in each subregion $\Omega_l$, then it has Thus for any function $f$ with bounded variation $V_{HK}(f; \Omega)$ in the sense of Hardy and Krause, the transport cost has an upper bound

Figures (15)

  • Figure 1: An illustration of tree-based density estimation.
  • Figure 2: Reflection invariance test, $d = 2, N = 3$: Figures \ref{['ref a']} and \ref{['ref b']} are symmetric about $x_1 = 0.5$. Intuitively, the reflection transformation should not change the uniformity of points. However, the star discrepancies of Figures \ref{['ref a']} and \ref{['ref b']} are different.
  • Figure 3: Rotation invariance test, $d = 2, N = 3$: Figure \ref{['rot b']} is obtained by rotating Figure \ref{['rot a']} 90 degrees clockwise. Intuitively, rotation should not change the uniformity of the points. However, the star discrepancies of Figure \ref{['rot a']} and \ref{['rot b']} are different.
  • Figure 4: $2$-D Beta mixtures: The KL divergence and Hellinger distance under different $N$ and $\theta$.
  • Figure 5: 2-D Beta mixtures: Adaptive partitions and density estimators produced by DSP-mix, MSP and DSP with $N = 1\times 10^4$ and $\theta=0.2$.
  • ...and 10 more figures

Theorems & Definitions (3)

  • Theorem 2.1
  • proof
  • Lemma 3.1