Table of Contents
Fetching ...

Minimax optimal density estimation using a shallow generative model with a one-dimensional latent variable

Hyeok Kyu Kwon, Minwoo Chae

TL;DR

Some statistical properties of the implicit density estimator pursued by VAE-type methods from a nonparametric density estimation framework are investigated and a near minimax optimal rate with respect to the Hellinger metric can be achieved by the simplest network architecture.

Abstract

A deep generative model yields an implicit estimator for the unknown distribution or density function of the observation. This paper investigates some statistical properties of the implicit density estimator pursued by VAE-type methods from a nonparametric density estimation framework. More specifically, we obtain convergence rates of the VAE-type density estimator under the assumption that the underlying true density function belongs to a locally Hölder class. Remarkably, a near minimax optimal rate with respect to the Hellinger metric can be achieved by the simplest network architecture, a shallow generative model with a one-dimensional latent variable.

Minimax optimal density estimation using a shallow generative model with a one-dimensional latent variable

TL;DR

Some statistical properties of the implicit density estimator pursued by VAE-type methods from a nonparametric density estimation framework are investigated and a near minimax optimal rate with respect to the Hellinger metric can be achieved by the simplest network architecture.

Abstract

A deep generative model yields an implicit estimator for the unknown distribution or density function of the observation. This paper investigates some statistical properties of the implicit density estimator pursued by VAE-type methods from a nonparametric density estimation framework. More specifically, we obtain convergence rates of the VAE-type density estimator under the assumption that the underlying true density function belongs to a locally Hölder class. Remarkably, a near minimax optimal rate with respect to the Hellinger metric can be achieved by the simplest network architecture, a shallow generative model with a one-dimensional latent variable.
Paper Structure (11 sections, 10 theorems, 109 equations, 3 figures)

This paper contains 11 sections, 10 theorems, 109 equations, 3 figures.

Key Result

Lemma 3.1

For any density function $p_0 \in \mathcal{C}^{\beta, L, \tau_0}({\mathbb R}^{d})$ satisfying assumptions (Tail 1) and (Tail 2), and small enough $\sigma > 0$, there exists a discrete probability measure $H(\cdot) = \sum_{i=1}^{N} w^{(i)} \delta_{{\bf x}^{(i)}}(\cdot)$ supported within a compact set and $N \lesssim \sigma^{-d} \{ \log (1/\sigma)\}^{\tau_3 d+ d}$, where $C = C({\rm all})$.

Figures (3)

  • Figure 1: (a) A finite mixture $\phi_\sigma * H$ can be represented as $P_{\widetilde{{\bf g}}, \sigma}$ for some function $\widetilde{{\bf g}}: [0,1] \to {\mathbb R}^d$. (b) If $\widetilde{{\bf g}}$ is a sum of $N$ indicator functions, it can be approximated by a shallow ReLU network function with $O(N)$ units.
  • Figure 2: The means and standard deviations of the squared Hellinger distances and training log-likelihood values. All results are based on 50 repetitions.
  • Figure 3: Step function approximation with ReLU network

Theorems & Definitions (10)

  • Lemma 3.1
  • Theorem 3.1
  • Theorem 4.1
  • Theorem 4.2
  • Lemma A.1
  • Lemma A.2
  • Corollary A.1
  • Lemma A.3
  • Lemma A.4
  • Lemma B.1