A Stein Identity for q-Gaussians with Bounded Support

Sophia Sklaviadis; Thomas Moellenhoff; Andre F. T. Martins; Mario A. T. Figueiredo; Mohammad Emtiyaz Khan

A Stein Identity for q-Gaussians with Bounded Support

Sophia Sklaviadis, Thomas Moellenhoff, Andre F. T. Martins, Mario A. T. Figueiredo, Mohammad Emtiyaz Khan

TL;DR

This work considers the class of bounded-support $q$-Gaussians and derive a new Stein identity leading to gradient estimators which have nearly identical forms to the Gaussian ones, and which are similarly easy to implement.

Abstract

Stein's identity is a fundamental tool in machine learning with applications in generative models, stochastic optimization, and other problems involving gradients of expectations under Gaussian distributions. Less attention has been paid to problems with non-Gaussian expectations. Here, we consider the class of bounded-support $q$-Gaussians and derive a new Stein identity leading to gradient estimators which have nearly identical forms to the Gaussian ones, and which are similarly easy to implement. We do this by extending the previous results of Landsman, Vanduffel, and Yao (2013) to prove new Bonnet- and Price-type theorems for q-Gaussians. We also simplify their forms by using escort distributions. Our experiments show that bounded-support distributions can reduce the variance of gradient estimators, which can potentially be useful for Bayesian deep learning and sharpness-aware minimization. Overall, our work simplifies the application of Stein's identity for an important class of non-Gaussian distributions.

A Stein Identity for q-Gaussians with Bounded Support

TL;DR

This work considers the class of bounded-support

-Gaussians and derive a new Stein identity leading to gradient estimators which have nearly identical forms to the Gaussian ones, and which are similarly easy to implement.

Abstract

-Gaussians and derive a new Stein identity leading to gradient estimators which have nearly identical forms to the Gaussian ones, and which are similarly easy to implement. We do this by extending the previous results of Landsman, Vanduffel, and Yao (2013) to prove new Bonnet- and Price-type theorems for q-Gaussians. We also simplify their forms by using escort distributions. Our experiments show that bounded-support distributions can reduce the variance of gradient estimators, which can potentially be useful for Bayesian deep learning and sharpness-aware minimization. Overall, our work simplifies the application of Stein's identity for an important class of non-Gaussian distributions.

Paper Structure (23 sections, 7 theorems, 110 equations, 3 figures, 1 table)

This paper contains 23 sections, 7 theorems, 110 equations, 3 figures, 1 table.

Introduction
Bounded-Support $q$-Gaussian Distributions
A Stein-type identity for bounded-support $q$-Gaussians
Bonnet- and Price-type Theorems
Applications
Bounded-variance Monte Carlo estimators
Numerical experiments
Synthetic logistic regression experiment.
Variational SGD with $q$-Gaussian noise.
Conclusion
Proof of Bonnet's and Price's theorems with Gaussians
Gaussian Stein, Bonnet, and Price.
Gaussian Stein and Price's theorem.
Elliptical Laws
Generalized Pearson Type II.
...and 8 more sections

Key Result

Lemma 1

The density of the generalized Pearson Type II subfamily with $q<1$ can be written as a $q$-Gaussian: where $\exp_{q}(t) := \left[1+t/m \right]_+^m$ denotes the $q$-deformed exponential function NAUDTS2002323, with $m = 1/(1-q)$ and $[ u ]_+ = \max( 0, u)$. The support radius $R$ is a function of $q$ and $D$ and is given by

Figures (3)

Figure 1: Bounded support $q$-Gaussians are a subclass of location-scale families that are elliptically contoured, that is, they are obtained by composing a quadratic $s(\hbox{$\hbox{$\mathbf{x}$}$}) = (\hbox{$\hbox{$\mathbf{x}$}$}-\hbox{$\hbox{$\boldsymbol{\mu}$}$})^\top \hbox{$\hbox{$\boldsymbol{\Sigma}$}$}^{-1}(\hbox{$\hbox{$\mathbf{x}$}$}-\hbox{$\hbox{$\boldsymbol{\mu}$}$})$ with a generator function $g$, thus $p(\hbox{$\hbox{$\mathbf{x}$}$}) = g\bigl(s(\hbox{$\hbox{$\mathbf{x}$}$})\bigr)$. On the right, we show three examples of $q$-Gaussians for $q=0, 0.5,$ and $0.99$. We see for larger $q$ the base densities (black curves) are less peaked and have larger support. We also show the first associated escort$2-q$-Gaussian densities (gray curves), which are slightly more peaked than their base densities. As $q\to 1$, $q$-Gaussians converge to Gaussians.
Figure 2: Synthetic logistic regression. Left: For $D \in \{10, 50, 200\}$ and $q \in \{0.0, 0.5, 0.8, 1\}$, we draw 8 Monte Carlo samples and compute the empirical per-coordinate gradient variance $\frac{1}{D}\sum_{j=1}^D \hbox{$\operatorname{Var}$}(\widehat{\nabla} F(\hbox{$\hbox{$\mathbf{w}$}$}^\star)_j)$, averaged over $50$ independent repetitions; standard errors are all <.003. Right: Maximum radius of the $q$-Gaussian support against $D$.
Figure 3: Sharpness-Aware Minimization (SAM) foret2020sharpness considers an adversarial perturbation over a compact ball. Variational stochastic gradient descent (VSGD) with Gaussian weight perturbation averages perturbations over whole space (unbounded support). Our proposed $q$-VSGD uses $q$-Gaussian weight perturbations, which have bounded support similarly to SAM but uses averages similarly to VSGD. The method combines the two complementary features of SAM and VSGD.

Theorems & Definitions (16)

Lemma 1
Theorem 1: Bounded-support $q$-Gaussian Stein identity
Lemma 2
Theorem 2: $q$-Bonnet
Theorem 3: $q$-Price
Proposition 1: Bounded variance MC estimators
proof
proof : Proof of Price
Remark 1: Symmetry in $\hbox{$\hbox{$\boldsymbol{\Sigma}$}$}$
proof : Proof of Lemma \ref{['lem:q-gaussian-pearsonII']}
...and 6 more

A Stein Identity for q-Gaussians with Bounded Support

TL;DR

Abstract

A Stein Identity for q-Gaussians with Bounded Support

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (16)