Table of Contents
Fetching ...

Stability of Sequential and Parallel Coordinate Ascent Variational Inference

Debdeep Pati

Abstract

We highlight a striking difference in behavior between two widely used variants of coordinate ascent variational inference: the sequential and parallel algorithms. While such differences were known in the numerical analysis literature in simpler settings, they remain largely unexplored in the optimization-focused literature on variational inference in more complex models. Focusing on the moderately high-dimensional linear regression problem, we show that the sequential algorithm, although typically slower, enjoys convergence guarantees under more relaxed conditions than the parallel variant, which is often employed to facilitate block-wise updates and improve computational efficiency.

Stability of Sequential and Parallel Coordinate Ascent Variational Inference

Abstract

We highlight a striking difference in behavior between two widely used variants of coordinate ascent variational inference: the sequential and parallel algorithms. While such differences were known in the numerical analysis literature in simpler settings, they remain largely unexplored in the optimization-focused literature on variational inference in more complex models. Focusing on the moderately high-dimensional linear regression problem, we show that the sequential algorithm, although typically slower, enjoys convergence guarantees under more relaxed conditions than the parallel variant, which is often employed to facilitate block-wise updates and improve computational efficiency.
Paper Structure (12 sections, 8 theorems, 40 equations, 3 figures)

This paper contains 12 sections, 8 theorems, 40 equations, 3 figures.

Key Result

Theorem 1

If $\rho\{Dg(\bar{x})\} < 1$, then the fixed point $\bar{x}$ of the nonlinear map eq:maps is asymptotically stable.

Figures (3)

  • Figure 1: Comparison of sequential (left) and parallel (right) versions for $p=2$.
  • Figure 2: Sequential CAVI for $(n,p,s)=(200,50,25)$. Left: variational mean $\mu_j$ (red) versus true coefficients $\beta_j$ (black). Right: ELBO as a function of iteration.
  • Figure 3: Distribution of $\log \rho(J)$ for sequential and parallel CAVI. Left: varying $p$ with $s=p$ and fixed $n =100$. Right: varying $s$ with fixed $(n=200, p=50)$.

Theorems & Definitions (14)

  • Theorem 1
  • Theorem 2
  • Remark 1: Plausibility of Assumption \ref{['ass:contraction']}
  • Theorem 3
  • Remark 2
  • Proposition 1
  • Proposition 2
  • proof
  • Lemma 1
  • proof
  • ...and 4 more