Table of Contents
Fetching ...

A Generalized Tangent Approximation based Variational Inference Framework for Strongly Super-Gaussian Likelihoods

Somjit Roy, Pritam Dey, Debdeep Pati, Bani K. Mallick

TL;DR

This work advances variational inference by generalizing tangent-approximation VI to strongly super-Gaussian likelihoods, enabling deterministic, geometry-driven updates with convergence guarantees. The TAVIE-SSG framework introduces alpha-fractional variational posteriors and an EM algorithm that scales linearly with data and is highly parallelizable. Theoretical results provide convergence and near-minimax variational-risk bounds under both fractional and standard likelihoods, while extensive experiments on heavy-tailed regression, Bayesian quantile regression, and spatial transcriptomics demonstrate superior scalability and competitive accuracy relative to state-of-the-art VI and MCMC methods. The approach offers a principled, scalable alternative for complex Bayesian models with non-Gaussian, heavy-tailed, or discrete responses, with practical implications across biostatistics, genomics, and econometrics.

Abstract

Variational inference, as an alternative to Markov chain Monte Carlo sampling, has played a transformative role in enabling scalable computation for complex Bayesian models. Nevertheless, existing approaches often depend on either rigid model-specific formulations or stochastic black-box optimization routines. Tangent approximation is a principled class of structured variational methods that exploits the geometry of the underlying probability model. However, its utility has largely been confined to logistic regression and related modeling regimes. In this article, we propose a novel variational framework based on tangent transformation for a broad class of probability models characterized by strongly super-Gaussian likelihoods. Our method leverages convex duality to construct tangent minorants of the log-likelihood, thereby inducing conjugacy with Gaussian priors over model parameters in an otherwise intractable setup. Under mild assumptions on the data-generating mechanism, we establish algorithmic convergence guarantees, a contribution that stands in contrast to the limited theoretical assurances typically available for black-box variational methods. Additionally, we derive near-minimax optimal bounds for the variational risk. Superior performance of our proposed methodology is illustrated on simulated and real-data scenarios that challenge state-of-the-art variational algorithms in terms of scalability and their ability to consistently capture complex underlying data structure.

A Generalized Tangent Approximation based Variational Inference Framework for Strongly Super-Gaussian Likelihoods

TL;DR

This work advances variational inference by generalizing tangent-approximation VI to strongly super-Gaussian likelihoods, enabling deterministic, geometry-driven updates with convergence guarantees. The TAVIE-SSG framework introduces alpha-fractional variational posteriors and an EM algorithm that scales linearly with data and is highly parallelizable. Theoretical results provide convergence and near-minimax variational-risk bounds under both fractional and standard likelihoods, while extensive experiments on heavy-tailed regression, Bayesian quantile regression, and spatial transcriptomics demonstrate superior scalability and competitive accuracy relative to state-of-the-art VI and MCMC methods. The approach offers a principled, scalable alternative for complex Bayesian models with non-Gaussian, heavy-tailed, or discrete responses, with practical implications across biostatistics, genomics, and econometrics.

Abstract

Variational inference, as an alternative to Markov chain Monte Carlo sampling, has played a transformative role in enabling scalable computation for complex Bayesian models. Nevertheless, existing approaches often depend on either rigid model-specific formulations or stochastic black-box optimization routines. Tangent approximation is a principled class of structured variational methods that exploits the geometry of the underlying probability model. However, its utility has largely been confined to logistic regression and related modeling regimes. In this article, we propose a novel variational framework based on tangent transformation for a broad class of probability models characterized by strongly super-Gaussian likelihoods. Our method leverages convex duality to construct tangent minorants of the log-likelihood, thereby inducing conjugacy with Gaussian priors over model parameters in an otherwise intractable setup. Under mild assumptions on the data-generating mechanism, we establish algorithmic convergence guarantees, a contribution that stands in contrast to the limited theoretical assurances typically available for black-box variational methods. Additionally, we derive near-minimax optimal bounds for the variational risk. Superior performance of our proposed methodology is illustrated on simulated and real-data scenarios that challenge state-of-the-art variational algorithms in terms of scalability and their ability to consistently capture complex underlying data structure.

Paper Structure

This paper contains 53 sections, 18 theorems, 218 equations, 31 figures, 2 tables, 2 algorithms.

Key Result

Proposition 1

For $\xi_i \in \mathbb{R}^{+}_0$, the likelihood in eq:SSG-defintion admits a minorizer: for $i\in [n]$, where $\gamma(t) := h(t^2) - t^2h'(t^2)$ and $A(t) := h'(t^2)$. Equality in eq:general-minorizer holds if and only if $|\zeta_i| = \xi_i$, i.e., $\varphi(y_i \mid \mathbf{x}_i, \theta, \xi_i)$ is tangent to $p(y_i \mid \mathbf{x}_i, \theta)$ at $|\zeta_i| = \xi_i$.

Figures (31)

  • Figure 1: Runtimes (in $\log$-scale) across $100$ data repetitions of $\mathsf{TAVIE}\text{-}\mathsf{SSG}$ and competitors for Student's-$t$ (Type I $\mathsf{SSG}$) likelihood ($\nu=5$) in Section \ref{['subsec:sim-exp-student']}, under varying sample sizes and feature dimensions.
  • Figure 2: Overview of the nonconvex landscape of $\mathsf{L}(\xi)$. Data $\mathcal{D}_n$ is generated from Student's-$t$$\mathsf{SSG}$ likelihood ($\nu=5, \tau^2=3$) with $\beta\sim \mathcal{N}_{p}(0, 0.5^2 I_p)$ and covariates $x_{ij}\sim \mathcal{N}_{1}(0, 1)$, i.i.d. Left: $\mathsf{TAVIE}\text{-}\mathsf{SSG}$ converges in $25$ iterations for $(n, p) = (2, 2)$ with optimal $\xi^{\star} = (0.822, 1.368)$. Right: $\mathsf{TAVIE}\text{-}\mathsf{SSG}$ converges in $66$ iterations for $(n, p) = (100, 50)$; the contour plot shows a randomly selected two-dimensional slice through the optimal iterate at $(\xi_{88}^{\star}, \xi_{79}^{\star}) = (0.897, 0.791)$ [marked using a red circle].
  • Figure 3: MSEs of $(\beta, \tau^2)$ (in $\log$-scale) across $100$ data repetitions of $\mathsf{TAVIE}\text{-}\mathsf{SSG}$ and competitors for the Student's-$t$$\mathsf{SSG}$ likelihood ($\nu=5$) under experiment E1: $n\in \{200, 500, 1000, 2000\},\; p=8$.
  • Figure 4: MSEs of $(\beta, \tau^2)$ (in $\log$-scale) across $100$ data repetitions of $\mathsf{TAVIE}\text{-}\mathsf{SSG}$ and competitors for the Student's-$t$$\mathsf{SSG}$ likelihood ($\nu=5$) under experiment E2: $p\in \{3, 8, 15, 20\},\; n=1000$.
  • Figure 5: Large-scale BQR performance of $\mathsf{TAVIE}\text{-}\mathsf{SSG}$ with comparison against the FAST QR algorithm.
  • ...and 26 more figures

Theorems & Definitions (42)

  • Definition 2.1: Strongly-super Gaussian ($\mathsf{SSG}$) likelihood function
  • Proposition 1: Tangent minorizer
  • proof
  • Proposition 2: Stationarity equivalence
  • Theorem 1: Convergence of the $\mathsf{TAVIE}\text{-}\mathsf{SSG}$ EM algorithm
  • Theorem 2: Convergence rate of the $\mathsf{TAVIE}\text{-}\mathsf{SSG}$ EM algorithm
  • Theorem 3: Variational risk bound for Type I $\mathsf{SSG}$ under $\alpha$-Rényi divergence
  • Theorem 4: Variational risk bound for Type II $\mathsf{SSG}$ under $\alpha$-Rényi divergence
  • Remark 3.1
  • Remark 3.2
  • ...and 32 more