Multivariate Density Estimation via Variance-Reduced Sketching

Yifan Peng; Yuehaw Khoo; Daren Wang

Multivariate Density Estimation via Variance-Reduced Sketching

Yifan Peng, Yuehaw Khoo, Daren Wang

TL;DR

This work addresses nonparametric multivariate density estimation in high dimensions by introducing Variance-Reduced Sketching (VRS), which treats multivariate densities as infinite-size tensors and recovers their range through low-variance moments. The core idea reduces the problem to estimating low-dimensional, informative moments and then reconstructing the density via leading singular functions, achieving a reduced curse of dimensionality with a single-pass algorithm. Theoretical results establish consistency and rate guarantees under a spectral-gap assumption, with error bounds that scale as $O_P\left(\frac{\sqrt{\prod_{j=1}^d r_j}}{N^{\alpha/(2\alpha+1)}} + \xi^*\right)$, and simulations/real-data experiments show VRS outperforming KDEs and neural density estimators across diverse models. The work provides practical algorithms, tuning strategies (including adaptive rank selection), and public code, highlighting VRS's potential for high-dimensional density estimation in science and engineering.

Abstract

Multivariate density estimation is of great interest in various scientific and engineering disciplines. In this work, we introduce a new framework called Variance-Reduced Sketching (VRS), specifically designed to estimate multivariate density functions with a reduced curse of dimensionality. Our VRS framework conceptualizes multivariate functions as infinite-size matrices/tensors, and facilitates a new sketching technique motivated by the numerical linear algebra literature to reduce the variance in density estimation problems. We demonstrate the robust numerical performance of VRS through a series of simulated experiments and real-world data applications. Notably, VRS shows remarkable improvement over existing neural network density estimators and classical kernel methods in numerous distribution models. Additionally, we offer theoretical justifications for VRS to support its ability to deliver density estimation with a reduced curse of dimensionality.

Multivariate Density Estimation via Variance-Reduced Sketching

TL;DR

, and simulations/real-data experiments show VRS outperforming KDEs and neural density estimators across diverse models. The work provides practical algorithms, tuning strategies (including adaptive rank selection), and public code, highlighting VRS's potential for high-dimensional density estimation in science and engineering.

Abstract

Paper Structure (35 sections, 41 theorems, 281 equations, 8 figures, 3 tables, 3 algorithms)

This paper contains 35 sections, 41 theorems, 281 equations, 8 figures, 3 tables, 3 algorithms.

Introduction
Variance-Reduced Sketching
Related literature
Organization
Notations
Background: linear algebra in function spaces
Density range estimation by sketching
Ranges and ranks of functions
The sketching algorithm
Density estimation by sketching
Simulations and real data examples
Implementations
Examples of exactly and approximately low-rank functions
Conclusion
Tensors and multivariable functions
...and 20 more sections

Key Result

Theorem 1

[Singular value decomposition in function space] Let $B(x, y):\Omega_1 \times \Omega_2 \to \mathbb R$ be any function such that $\|B\|_{{ {\bf L}_2 } (\Omega_1 \times \Omega_2)} < \infty$. There exists a collection of strictly positive singular values $\{ \sigma_{\rho} (B) \}_{{\rho}=1}^{r} \subset In this case, we say that the rank of $B(x, y)$ is $r$.

Figures (8)

Figure 1: Density estimation with data sampled from the Ginzburg-Landau density in \ref{['eq:GL density']}. The $x$-axis represents dimensionality, varying from $2$ to $10$. NN-MAF corresponds to the Masked Autoregressive Flow method papamakarios2017masked and NN-NAF corresponds to the Neural Autoregressive Flows method huang2018neural The performance of different estimators is evaluated using ${ {\bf L}_2 }$-errors. Additional details are provided in Simulation $\mathbf{III}$ of \ref{['sec: numerical experiments']}.
Figure 2: The sketched function $AS$ by VRS retains the range in the variable $x$ of $A (x, y)$. The complexity of estimating the range of $A$ using $AS$ is much lower than the complexity of directly estimating $A$.
Figure 3: Density functions from the two-dimensional Gaussian mixture model in Simulation $\mathbf{IV}$. From left to right are the ground truth density, estimates from VRS, KDE, MAF, and NAF, respectively. The values in the colorbar on the right represent function values.
Figure 4: Density functions from the two-dimensional Gaussian mixture model in Simulation $\mathbf{I}$. From left to right: the ground truth density, and estimations from VRS, KDE, MAF, and NAF. The values in the colorbar on the right represent function values.
Figure 5: Marginal densities from the 30-dimensional Gaussian mixture model in Simulation II. From left to right: the ground truth density, and estimations from VRS, KDE, MAF, and NAF. From top to bottom: two-dimensional marginal densities corresponding to $(x_1, x_2)$, $(x_4, x_8)$, and $(x_{10}, x_{20})$. The values in the colorbar on the right represent function values.
...and 3 more figures

Theorems & Definitions (90)

Theorem 1
Theorem 2
Theorem 3
Example 1: Additive models in regression
Example 2: Mean-field models in density estimation
Example 3: Multivariate Taylor expansion
Remark 1
Lemma 1
proof
Definition 1: Coefficient tensors
...and 80 more

Multivariate Density Estimation via Variance-Reduced Sketching

TL;DR

Abstract

Multivariate Density Estimation via Variance-Reduced Sketching

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (90)