Table of Contents
Fetching ...

Stein's Lemma for the Reparameterization Trick with Exponential Family Mixtures

Wu Lin, Mohammad Emtiyaz Khan, Mark Schmidt

TL;DR

This work generalizes Stein's lemma beyond Gaussian limits to exponential-family mixtures and Gaussian variance-mean mixtures with arbitrary covariance, linking Stein's identities to the reparameterization trick under weak regularity. It derives first- and second-order gradient identities for expectations under broad distributions, including multivariate Student's t, skew Gaussian, exponentially modified Gaussian, and normal-inverse-Gaussian, with a unified implicit reparameterization framework. The results enable low-variance gradient estimators for variational inference and Bayesian neural networks when employing Gaussian variance-mean mixtures, and provide scalable identities for EF mixtures via structured Jacobians and latent-variable representations. Overall, the paper expands the applicability of gradient estimation techniques in probabilistic modeling and learning with rich mixture models.

Abstract

Stein's method (Stein, 1973; 1981) is a powerful tool for statistical applications and has significantly impacted machine learning. Stein's lemma plays an essential role in Stein's method. Previous applications of Stein's lemma either required strong technical assumptions or were limited to Gaussian distributions with restricted covariance structures. In this work, we extend Stein's lemma to exponential-family mixture distributions, including Gaussian distributions with full covariance structures. Our generalization enables us to establish a connection between Stein's lemma and the reparameterization trick to derive gradients of expectations of a large class of functions under weak assumptions. Using this connection, we can derive many new reparameterizable gradient identities that go beyond the reach of existing works. For example, we give gradient identities when the expectation is taken with respect to Student's t-distribution, skew Gaussian, exponentially modified Gaussian, and normal inverse Gaussian.

Stein's Lemma for the Reparameterization Trick with Exponential Family Mixtures

TL;DR

This work generalizes Stein's lemma beyond Gaussian limits to exponential-family mixtures and Gaussian variance-mean mixtures with arbitrary covariance, linking Stein's identities to the reparameterization trick under weak regularity. It derives first- and second-order gradient identities for expectations under broad distributions, including multivariate Student's t, skew Gaussian, exponentially modified Gaussian, and normal-inverse-Gaussian, with a unified implicit reparameterization framework. The results enable low-variance gradient estimators for variational inference and Bayesian neural networks when employing Gaussian variance-mean mixtures, and provide scalable identities for EF mixtures via structured Jacobians and latent-variable representations. Overall, the paper expands the applicability of gradient estimation techniques in probabilistic modeling and learning with rich mixture models.

Abstract

Stein's method (Stein, 1973; 1981) is a powerful tool for statistical applications and has significantly impacted machine learning. Stein's lemma plays an essential role in Stein's method. Previous applications of Stein's lemma either required strong technical assumptions or were limited to Gaussian distributions with restricted covariance structures. In this work, we extend Stein's lemma to exponential-family mixture distributions, including Gaussian distributions with full covariance structures. Our generalization enables us to establish a connection between Stein's lemma and the reparameterization trick to derive gradients of expectations of a large class of functions under weak assumptions. Using this connection, we can derive many new reparameterizable gradient identities that go beyond the reach of existing works. For example, we give gradient identities when the expectation is taken with respect to Student's t-distribution, skew Gaussian, exponentially modified Gaussian, and normal inverse Gaussian.

Paper Structure

This paper contains 43 sections, 25 theorems, 131 equations.

Key Result

Lemma 1

(Stein's Lemma): Let $h(\cdot):\mathcal{R}\mapsto \mathcal{R}$ be locally AC. $q(z)$ is an univariate Gaussian distribution denoted by $\hbox{${\cal N}$}(z|\mu,\sigma)$, where $\mu$ is its mean and $\sigma$ is its variance. The following first-order identity holds. where $\frac{-\nabla_z q(z) }{q(z)}= \sigma^{-1} \left( z-\mu \right)$.

Theorems & Definitions (32)

  • Definition 1
  • Definition 2
  • Definition 3
  • Lemma 1
  • Theorem 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Theorem 2
  • Lemma 5
  • ...and 22 more