Table of Contents
Fetching ...

Scalable Random Feature Latent Variable Models

Ying Li, Zhidi Lin, Yuhao Liu, Michael Minyi Zhang, Pablo M. Olmos, Petar M. Djurić

TL;DR

A scalable RFLVM framework based on variational Bayesian inference (VBI), a deterministic and optimization-based alternative to sampling methods, and a novel inference algorithm, block coordinate descent variational inference (BCD-VI), which partitions variational parameters into blocks and applies tailored solvers to optimize them efficiently.

Abstract

Random feature latent variable models (RFLVMs) represent the state-of-the-art in latent variable models, capable of handling non-Gaussian likelihoods and effectively uncovering patterns in high-dimensional data. However, their heavy reliance on Monte Carlo sampling results in scalability issues which makes it difficult to use these models for datasets with a massive number of observations. To scale up RFLVMs, we turn to the optimization-based variational Bayesian inference (VBI) algorithm which is known for its scalability compared to sampling-based methods. However, implementing VBI for RFLVMs poses challenges, such as the lack of explicit probability distribution functions (PDFs) for the Dirichlet process (DP) in the kernel learning component, and the incompatibility of existing VBI algorithms with RFLVMs. To address these issues, we introduce a stick-breaking construction for DP to obtain an explicit PDF and a novel VBI algorithm called ``block coordinate descent variational inference" (BCD-VI). This enables the development of a scalable version of RFLVMs, or in short, SRFLVM. Our proposed method shows scalability, computational efficiency, superior performance in generating informative latent representations and the ability of imputing missing data across various real-world datasets, outperforming state-of-the-art competitors.

Scalable Random Feature Latent Variable Models

TL;DR

A scalable RFLVM framework based on variational Bayesian inference (VBI), a deterministic and optimization-based alternative to sampling methods, and a novel inference algorithm, block coordinate descent variational inference (BCD-VI), which partitions variational parameters into blocks and applies tailored solvers to optimize them efficiently.

Abstract

Random feature latent variable models (RFLVMs) represent the state-of-the-art in latent variable models, capable of handling non-Gaussian likelihoods and effectively uncovering patterns in high-dimensional data. However, their heavy reliance on Monte Carlo sampling results in scalability issues which makes it difficult to use these models for datasets with a massive number of observations. To scale up RFLVMs, we turn to the optimization-based variational Bayesian inference (VBI) algorithm which is known for its scalability compared to sampling-based methods. However, implementing VBI for RFLVMs poses challenges, such as the lack of explicit probability distribution functions (PDFs) for the Dirichlet process (DP) in the kernel learning component, and the incompatibility of existing VBI algorithms with RFLVMs. To address these issues, we introduce a stick-breaking construction for DP to obtain an explicit PDF and a novel VBI algorithm called ``block coordinate descent variational inference" (BCD-VI). This enables the development of a scalable version of RFLVMs, or in short, SRFLVM. Our proposed method shows scalability, computational efficiency, superior performance in generating informative latent representations and the ability of imputing missing data across various real-world datasets, outperforming state-of-the-art competitors.

Paper Structure

This paper contains 43 sections, 4 theorems, 73 equations, 6 figures, 5 tables, 2 algorithms.

Key Result

Theorem 1

Let $\kappa(\mathbf{x}, \mathbf{x}^\prime)$ be a positive definite stationary kernel function, and let $\vvarphi(\mathbf{x})$ be an associated randomized feature map defined as follows: where $\mathbf{W} \triangleq \{\mathbf{w}_l\}_{l=1}^{L/2} \in \mathbb{R}^{\frac{L}{2} \times Q}$ are independent and identically distributed (i.i.d.) random vectors drawn from the spectral density $p(\mathbf{w})$.

Figures (6)

  • Figure 1: Graphical Model of RFLVMs. We use arrows to denote the dependency relations between variables. The blue and green circles denote the observed and variational variables, respectively. Smaller dots indicate the deterministic parameters of the model. The nodes surrounded by a box represent that there are $K$, $M$ or $L/2$ nodes of this kind.
  • Figure 2: Graphical Model of RFLVM with a stick-breaking construction. We use arrows to denote the dependency relations between variables. The blue and green circles denote the observed and variational variables, respectively. Smaller dots indicate the deterministic parameters of the model. The nodes surrounded by a box represent that there are $K$, $M$ or $L/2$ nodes of this kind.
  • Figure 3: (a) Comparison of true latent variable $\mathbf{X}$ or true kernel matrix $\mathbf{K}$ with inferred latent variables $\hat{\mathbf{X}}$ or inferred kernel matrices $\hat{\mathbf{K}}$ obtained from various methods. (b) Logarithm of wall-time for model fitting plotted against $N$ or $M$.
  • Figure 4: (Left) Ground-truth kernel matrix; (Right Top) Gaussian case, $\hat{\mathbf{K}}$ obtained without marginalized $\mathbf{H}$ versus with marginalized $\mathbf{H}$; (Right Bottom) Bernoulli case, $\hat{\mathbf{K}}$ obtained without versus with the closed-form solution for the posterior of $\mathbf{H}$.
  • Figure 5: MNIST reconstruction task with missing pixels. From left to right: Ground truth, training images, reconstructions.
  • ...and 1 more figures

Theorems & Definitions (13)

  • Theorem 1
  • proof
  • Example 1: Logistic distribution
  • Example 2: Gaussian Distribution
  • Example 3
  • Theorem 2
  • Corollary 2.1
  • Remark 1
  • Theorem 3
  • proof
  • ...and 3 more