Table of Contents
Fetching ...

A Stein Identity for q-Gaussians with Bounded Support

Sophia Sklaviadis, Thomas Moellenhoff, Andre F. T. Martins, Mario A. T. Figueiredo, Mohammad Emtiyaz Khan

TL;DR

This work considers the class of bounded-support $q$-Gaussians and derive a new Stein identity leading to gradient estimators which have nearly identical forms to the Gaussian ones, and which are similarly easy to implement.

Abstract

Stein's identity is a fundamental tool in machine learning with applications in generative models, stochastic optimization, and other problems involving gradients of expectations under Gaussian distributions. Less attention has been paid to problems with non-Gaussian expectations. Here, we consider the class of bounded-support $q$-Gaussians and derive a new Stein identity leading to gradient estimators which have nearly identical forms to the Gaussian ones, and which are similarly easy to implement. We do this by extending the previous results of Landsman, Vanduffel, and Yao (2013) to prove new Bonnet- and Price-type theorems for q-Gaussians. We also simplify their forms by using escort distributions. Our experiments show that bounded-support distributions can reduce the variance of gradient estimators, which can potentially be useful for Bayesian deep learning and sharpness-aware minimization. Overall, our work simplifies the application of Stein's identity for an important class of non-Gaussian distributions.

A Stein Identity for q-Gaussians with Bounded Support

TL;DR

This work considers the class of bounded-support -Gaussians and derive a new Stein identity leading to gradient estimators which have nearly identical forms to the Gaussian ones, and which are similarly easy to implement.

Abstract

Stein's identity is a fundamental tool in machine learning with applications in generative models, stochastic optimization, and other problems involving gradients of expectations under Gaussian distributions. Less attention has been paid to problems with non-Gaussian expectations. Here, we consider the class of bounded-support -Gaussians and derive a new Stein identity leading to gradient estimators which have nearly identical forms to the Gaussian ones, and which are similarly easy to implement. We do this by extending the previous results of Landsman, Vanduffel, and Yao (2013) to prove new Bonnet- and Price-type theorems for q-Gaussians. We also simplify their forms by using escort distributions. Our experiments show that bounded-support distributions can reduce the variance of gradient estimators, which can potentially be useful for Bayesian deep learning and sharpness-aware minimization. Overall, our work simplifies the application of Stein's identity for an important class of non-Gaussian distributions.
Paper Structure (23 sections, 7 theorems, 110 equations, 3 figures, 1 table)

This paper contains 23 sections, 7 theorems, 110 equations, 3 figures, 1 table.

Key Result

Lemma 1

The density of the generalized Pearson Type II subfamily with $q<1$ can be written as a $q$-Gaussian: where $\exp_{q}(t) := \left[1+t/m \right]_+^m$ denotes the $q$-deformed exponential function NAUDTS2002323, with $m = 1/(1-q)$ and $[ u ]_+ = \max( 0, u)$. The support radius $R$ is a function of $q$ and $D$ and is given by

Figures (3)

  • Figure 1: Bounded support $q$-Gaussians are a subclass of location-scale families that are elliptically contoured, that is, they are obtained by composing a quadratic $s(\hbox{$\hbox{$\mathbf{x}$}$}) = (\hbox{$\hbox{$\mathbf{x}$}$}-\hbox{$\hbox{$\boldsymbol{\mu}$}$})^\top \hbox{$\hbox{$\boldsymbol{\Sigma}$}$}^{-1}(\hbox{$\hbox{$\mathbf{x}$}$}-\hbox{$\hbox{$\boldsymbol{\mu}$}$})$ with a generator function $g$, thus $p(\hbox{$\hbox{$\mathbf{x}$}$}) = g\bigl(s(\hbox{$\hbox{$\mathbf{x}$}$})\bigr)$. On the right, we show three examples of $q$-Gaussians for $q=0, 0.5,$ and $0.99$. We see for larger $q$ the base densities (black curves) are less peaked and have larger support. We also show the first associated escort$2-q$-Gaussian densities (gray curves), which are slightly more peaked than their base densities. As $q\to 1$, $q$-Gaussians converge to Gaussians.
  • Figure 2: Synthetic logistic regression. Left: For $D \in \{10, 50, 200\}$ and $q \in \{0.0, 0.5, 0.8, 1\}$, we draw 8 Monte Carlo samples and compute the empirical per-coordinate gradient variance $\frac{1}{D}\sum_{j=1}^D \hbox{$\operatorname{Var}$}(\widehat{\nabla} F(\hbox{$\hbox{$\mathbf{w}$}$}^\star)_j)$, averaged over $50$ independent repetitions; standard errors are all <.003. Right: Maximum radius of the $q$-Gaussian support against $D$.
  • Figure 3: Sharpness-Aware Minimization (SAM) foret2020sharpness considers an adversarial perturbation over a compact ball. Variational stochastic gradient descent (VSGD) with Gaussian weight perturbation averages perturbations over whole space (unbounded support). Our proposed $q$-VSGD uses $q$-Gaussian weight perturbations, which have bounded support similarly to SAM but uses averages similarly to VSGD. The method combines the two complementary features of SAM and VSGD.

Theorems & Definitions (16)

  • Lemma 1
  • Theorem 1: Bounded-support $q$-Gaussian Stein identity
  • Lemma 2
  • Theorem 2: $q$-Bonnet
  • Theorem 3: $q$-Price
  • Proposition 1: Bounded variance MC estimators
  • proof
  • proof : Proof of Price
  • Remark 1: Symmetry in $\hbox{$\hbox{$\boldsymbol{\Sigma}$}$}$
  • proof : Proof of Lemma \ref{['lem:q-gaussian-pearsonII']}
  • ...and 6 more