Table of Contents
Fetching ...

Multivariate Density Estimation via Variance-Reduced Sketching

Yifan Peng, Yuehaw Khoo, Daren Wang

TL;DR

This work addresses nonparametric multivariate density estimation in high dimensions by introducing Variance-Reduced Sketching (VRS), which treats multivariate densities as infinite-size tensors and recovers their range through low-variance moments. The core idea reduces the problem to estimating low-dimensional, informative moments and then reconstructing the density via leading singular functions, achieving a reduced curse of dimensionality with a single-pass algorithm. Theoretical results establish consistency and rate guarantees under a spectral-gap assumption, with error bounds that scale as $O_P\left(\frac{\sqrt{\prod_{j=1}^d r_j}}{N^{\alpha/(2\alpha+1)}} + \xi^*\right)$, and simulations/real-data experiments show VRS outperforming KDEs and neural density estimators across diverse models. The work provides practical algorithms, tuning strategies (including adaptive rank selection), and public code, highlighting VRS's potential for high-dimensional density estimation in science and engineering.

Abstract

Multivariate density estimation is of great interest in various scientific and engineering disciplines. In this work, we introduce a new framework called Variance-Reduced Sketching (VRS), specifically designed to estimate multivariate density functions with a reduced curse of dimensionality. Our VRS framework conceptualizes multivariate functions as infinite-size matrices/tensors, and facilitates a new sketching technique motivated by the numerical linear algebra literature to reduce the variance in density estimation problems. We demonstrate the robust numerical performance of VRS through a series of simulated experiments and real-world data applications. Notably, VRS shows remarkable improvement over existing neural network density estimators and classical kernel methods in numerous distribution models. Additionally, we offer theoretical justifications for VRS to support its ability to deliver density estimation with a reduced curse of dimensionality.

Multivariate Density Estimation via Variance-Reduced Sketching

TL;DR

This work addresses nonparametric multivariate density estimation in high dimensions by introducing Variance-Reduced Sketching (VRS), which treats multivariate densities as infinite-size tensors and recovers their range through low-variance moments. The core idea reduces the problem to estimating low-dimensional, informative moments and then reconstructing the density via leading singular functions, achieving a reduced curse of dimensionality with a single-pass algorithm. Theoretical results establish consistency and rate guarantees under a spectral-gap assumption, with error bounds that scale as , and simulations/real-data experiments show VRS outperforming KDEs and neural density estimators across diverse models. The work provides practical algorithms, tuning strategies (including adaptive rank selection), and public code, highlighting VRS's potential for high-dimensional density estimation in science and engineering.

Abstract

Multivariate density estimation is of great interest in various scientific and engineering disciplines. In this work, we introduce a new framework called Variance-Reduced Sketching (VRS), specifically designed to estimate multivariate density functions with a reduced curse of dimensionality. Our VRS framework conceptualizes multivariate functions as infinite-size matrices/tensors, and facilitates a new sketching technique motivated by the numerical linear algebra literature to reduce the variance in density estimation problems. We demonstrate the robust numerical performance of VRS through a series of simulated experiments and real-world data applications. Notably, VRS shows remarkable improvement over existing neural network density estimators and classical kernel methods in numerous distribution models. Additionally, we offer theoretical justifications for VRS to support its ability to deliver density estimation with a reduced curse of dimensionality.
Paper Structure (35 sections, 41 theorems, 281 equations, 8 figures, 3 tables, 3 algorithms)

This paper contains 35 sections, 41 theorems, 281 equations, 8 figures, 3 tables, 3 algorithms.

Key Result

Theorem 1

[Singular value decomposition in function space] Let $B(x, y):\Omega_1 \times \Omega_2 \to \mathbb R$ be any function such that $\|B\|_{{ {\bf L}_2 } (\Omega_1 \times \Omega_2)} < \infty$. There exists a collection of strictly positive singular values $\{ \sigma_{\rho} (B) \}_{{\rho}=1}^{r} \subset In this case, we say that the rank of $B(x, y)$ is $r$.

Figures (8)

  • Figure 1: Density estimation with data sampled from the Ginzburg-Landau density in \ref{['eq:GL density']}. The $x$-axis represents dimensionality, varying from $2$ to $10$. NN-MAF corresponds to the Masked Autoregressive Flow method papamakarios2017masked and NN-NAF corresponds to the Neural Autoregressive Flows method huang2018neural The performance of different estimators is evaluated using ${ {\bf L}_2 }$-errors. Additional details are provided in Simulation $\mathbf{III}$ of \ref{['sec: numerical experiments']}.
  • Figure 2: The sketched function $AS$ by VRS retains the range in the variable $x$ of $A (x, y)$. The complexity of estimating the range of $A$ using $AS$ is much lower than the complexity of directly estimating $A$.
  • Figure 3: Density functions from the two-dimensional Gaussian mixture model in Simulation $\mathbf{IV}$. From left to right are the ground truth density, estimates from VRS, KDE, MAF, and NAF, respectively. The values in the colorbar on the right represent function values.
  • Figure 4: Density functions from the two-dimensional Gaussian mixture model in Simulation $\mathbf{I}$. From left to right: the ground truth density, and estimations from VRS, KDE, MAF, and NAF. The values in the colorbar on the right represent function values.
  • Figure 5: Marginal densities from the 30-dimensional Gaussian mixture model in Simulation II. From left to right: the ground truth density, and estimations from VRS, KDE, MAF, and NAF. From top to bottom: two-dimensional marginal densities corresponding to $(x_1, x_2)$, $(x_4, x_8)$, and $(x_{10}, x_{20})$. The values in the colorbar on the right represent function values.
  • ...and 3 more figures

Theorems & Definitions (90)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Example 1: Additive models in regression
  • Example 2: Mean-field models in density estimation
  • Example 3: Multivariate Taylor expansion
  • Remark 1
  • Lemma 1
  • proof
  • Definition 1: Coefficient tensors
  • ...and 80 more