Minimax optimal density estimation using a shallow generative model with a one-dimensional latent variable

Hyeok Kyu Kwon; Minwoo Chae

Minimax optimal density estimation using a shallow generative model with a one-dimensional latent variable

Hyeok Kyu Kwon, Minwoo Chae

TL;DR

Some statistical properties of the implicit density estimator pursued by VAE-type methods from a nonparametric density estimation framework are investigated and a near minimax optimal rate with respect to the Hellinger metric can be achieved by the simplest network architecture.

Abstract

A deep generative model yields an implicit estimator for the unknown distribution or density function of the observation. This paper investigates some statistical properties of the implicit density estimator pursued by VAE-type methods from a nonparametric density estimation framework. More specifically, we obtain convergence rates of the VAE-type density estimator under the assumption that the underlying true density function belongs to a locally Hölder class. Remarkably, a near minimax optimal rate with respect to the Hellinger metric can be achieved by the simplest network architecture, a shallow generative model with a one-dimensional latent variable.

Minimax optimal density estimation using a shallow generative model with a one-dimensional latent variable

TL;DR

Abstract

Paper Structure (11 sections, 10 theorems, 109 equations, 3 figures)

This paper contains 11 sections, 10 theorems, 109 equations, 3 figures.

INTRODUCTION
Notations and Definitions
A LIKELIHOOD APPROACH TO DEEP GENERATIVE MODELS
MAIN RESULTS
Assumptions on True Density Function
Convergence Rate of a Sieve MLE
AN ALTERNATIVE PROOF AND STRUCTURED DENSITY ESTIMATION
NUMERICAL EXPERIMENTS
CONCLUSIONS
PROOF OF THEOREM \ref{['thm:main']}
PROOF OF THEOREM \ref{['thm:structured']}

Key Result

Lemma 3.1

For any density function $p_0 \in \mathcal{C}^{\beta, L, \tau_0}({\mathbb R}^{d})$ satisfying assumptions (Tail 1) and (Tail 2), and small enough $\sigma > 0$, there exists a discrete probability measure $H(\cdot) = \sum_{i=1}^{N} w^{(i)} \delta_{{\bf x}^{(i)}}(\cdot)$ supported within a compact set and $N \lesssim \sigma^{-d} \{ \log (1/\sigma)\}^{\tau_3 d+ d}$, where $C = C({\rm all})$.

Figures (3)

Figure 1: (a) A finite mixture $\phi_\sigma * H$ can be represented as $P_{\widetilde{{\bf g}}, \sigma}$ for some function $\widetilde{{\bf g}}: [0,1] \to {\mathbb R}^d$. (b) If $\widetilde{{\bf g}}$ is a sum of $N$ indicator functions, it can be approximated by a shallow ReLU network function with $O(N)$ units.
Figure 2: The means and standard deviations of the squared Hellinger distances and training log-likelihood values. All results are based on 50 repetitions.
Figure 3: Step function approximation with ReLU network

Theorems & Definitions (10)

Lemma 3.1
Theorem 3.1
Theorem 4.1
Theorem 4.2
Lemma A.1
Lemma A.2
Corollary A.1
Lemma A.3
Lemma A.4
Lemma B.1

Minimax optimal density estimation using a shallow generative model with a one-dimensional latent variable

TL;DR

Abstract

Minimax optimal density estimation using a shallow generative model with a one-dimensional latent variable

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (10)