Table of Contents
Fetching ...

Sylvester Normalizing Flows for Variational Inference

Rianne van den Berg, Leonard Hasenclever, Jakub M. Tomczak, Max Welling

TL;DR

The paper addresses the limitation of simple variational posteriors in VI by introducing Sylvester normalizing flows (SNFs), a generalization of planar flows that uses Sylvester's determinant identity to maintain a tractable Jacobian while removing bottlenecks. SNFs are parameterized to allow flexible, data-dependent transformations, including three orthogonality-preserving variants: Orthogonal, Householder, and Triangular Sylvester flows, with a hypernetwork enabling data-conditioned flow parameters. Empirical results on MNIST, FreyFaces, Omniglot, and Caltech 101 Silhouettes show SNFs often outperform planar flows and IAF, particularly on larger or more complex datasets, while some datasets like FreyFaces favor planar flows. The approach offers a scalable, expressive, and data-adaptive way to enrich variational posteriors, potentially improving ELBO tightness and generative performance in VAEs and related models.

Abstract

Variational inference relies on flexible approximate posterior distributions. Normalizing flows provide a general recipe to construct flexible variational posteriors. We introduce Sylvester normalizing flows, which can be seen as a generalization of planar flows. Sylvester normalizing flows remove the well-known single-unit bottleneck from planar flows, making a single transformation much more flexible. We compare the performance of Sylvester normalizing flows against planar flows and inverse autoregressive flows and demonstrate that they compare favorably on several datasets.

Sylvester Normalizing Flows for Variational Inference

TL;DR

The paper addresses the limitation of simple variational posteriors in VI by introducing Sylvester normalizing flows (SNFs), a generalization of planar flows that uses Sylvester's determinant identity to maintain a tractable Jacobian while removing bottlenecks. SNFs are parameterized to allow flexible, data-dependent transformations, including three orthogonality-preserving variants: Orthogonal, Householder, and Triangular Sylvester flows, with a hypernetwork enabling data-conditioned flow parameters. Empirical results on MNIST, FreyFaces, Omniglot, and Caltech 101 Silhouettes show SNFs often outperform planar flows and IAF, particularly on larger or more complex datasets, while some datasets like FreyFaces favor planar flows. The approach offers a scalable, expressive, and data-adaptive way to enrich variational posteriors, potentially improving ELBO tightness and generative performance in VAEs and related models.

Abstract

Variational inference relies on flexible approximate posterior distributions. Normalizing flows provide a general recipe to construct flexible variational posteriors. We introduce Sylvester normalizing flows, which can be seen as a generalization of planar flows. Sylvester normalizing flows remove the well-known single-unit bottleneck from planar flows, making a single transformation much more flexible. We compare the performance of Sylvester normalizing flows against planar flows and inverse autoregressive flows and demonstrate that they compare favorably on several datasets.

Paper Structure

This paper contains 21 sections, 2 theorems, 30 equations, 3 figures, 3 tables.

Key Result

Theorem 1

For all $\mathbf{A}\in\mathbb{R}^{D\times M},\mathbf{B}\in \mathbb{R}^{M\times D}$, where $\mathbf{I}_M$ and $\mathbf{I}_D$ are $M$ and $D$-dimensional identity matrices, respectively.

Figures (3)

  • Figure 1: Since the ELBO is only a lower bound on the log marginal likelihood, they do not share the same local maxima. The looser the ELBO is the more this can bias maximum likelihood estimates of the model parameters.
  • Figure 2: Different amortization strategies for Sylvester normalizing flows and Inverse Autoregressive Flows. Left: our inference network produces input-dependent flow parameters through a hypernetwork ha2016hypernetworks. This strategy is also employed by planar flows. Right: IAF introduces a measure of $\mathbf{x}$ dependence through a context $\mathbf{h}(\mathbf{x})$. This context acts as an additional input for each transformation. The flow parameters themselves are independent of $\mathbf{x}$, but the number of hidden units per flow is larger than in SNF.
  • Figure 3: The negative evidence lower bound for static MNIST. The results for H-SNF with 4 reflections per orthogonal matrix are left out for clarity, as they are very similar to the results with 8 reflections. Each model is evaluated 3 times. The shaded areas indicate $\pm$ one standard deviation.

Theorems & Definitions (3)

  • Theorem 1: Sylvester's determinant identity
  • Theorem 2
  • proof