Table of Contents
Fetching ...

Fast Mixing of Data Augmentation Algorithms: Bayesian Probit, Logit, and Lasso Regression

Holden Lee, Kexin Zhang

TL;DR

The paper advances non-asymptotic understanding of data-augmentation Gibbs samplers for Bayesian probit, logit, and Lasso models by deriving polynomial mixing-time bounds via a conductance/isoperimetry framework. It provides explicit rates: for ProbitDA and LogitDA, $t_{ ext{mix}} = \tilde{O}(nd\;\log\frac{\log\eta}{\varepsilon})$ and, for LassoDA, $t_{ ext{mix}} = O\(d^2(d\log d+n\log n)^2\; \log(\frac{\eta}{\varepsilon})\) with improvements to $\tilde{O}(n+d)$ under independent bounded or log-concave data; feasible-start variants yield additional refinements. The approach reduces the DA analysis to one-step overlap bounds and employs a conductance-profile enhancement to handle the non-log-concave Lasso target via a log-concave transformation. Numerical experiments corroborate the theoretical bounds, illustrating tight dependence on $n$ and the nuanced interaction between $n$ and $d$ for LassoDA. Overall, the results establish fast, non-asymptotic mixing guarantees for widely used Bayesian DA algorithms and contrast favorably with general-purpose sampling methods in high-dimensional regimes.

Abstract

Despite the widespread use of the data augmentation (DA) algorithm, the theoretical understanding of its convergence behavior remains incomplete. We prove the first non-asymptotic polynomial upper bounds on mixing times of three important DA algorithms: DA algorithm for Bayesian Probit regression (Albert and Chib, 1993, ProbitDA), Bayesian Logit regression (Polson, Scott, and Windle, 2013, LogitDA), and Bayesian Lasso regression (Park and Casella, 2008, Rajaratnam et al., 2015, LassoDA). Concretely, we demonstrate that with $η$-warm start, parameter dimension $d$, and sample size $n$, the ProbitDA and LogitDA require $\mathcal{O}\left(nd\log \left(\frac{\log η}ε\right)\right)$ steps to obtain samples with at most $ε$ TV error, whereas the LassoDA requires $\mathcal{O}\left(d^2(d\log d +n \log n)^2 \log \left(\fracηε\right)\right)$ steps. The results are generally applicable to settings with large $n$ and large $d$, including settings with highly imbalanced response data in the Probit and Logit regression. The proofs are based on the Markov chain conductance and isoperimetric inequalities. Assuming that data are independently generated from either a bounded, sub-Gaussian, or log-concave distribution, we improve the guarantees for ProbitDA and LogitDA to $\tilde{\mathcal{O}}(n+d)$ with high probability, and compare it with the best known guarantees of Langevin Monte Carlo and Metropolis Adjusted Langevin Algorithm. We also discuss the mixing times of the three algorithms under feasible initialization.

Fast Mixing of Data Augmentation Algorithms: Bayesian Probit, Logit, and Lasso Regression

TL;DR

The paper advances non-asymptotic understanding of data-augmentation Gibbs samplers for Bayesian probit, logit, and Lasso models by deriving polynomial mixing-time bounds via a conductance/isoperimetry framework. It provides explicit rates: for ProbitDA and LogitDA, and, for LassoDA, \tilde{O}(n+d)nnd$ for LassoDA. Overall, the results establish fast, non-asymptotic mixing guarantees for widely used Bayesian DA algorithms and contrast favorably with general-purpose sampling methods in high-dimensional regimes.

Abstract

Despite the widespread use of the data augmentation (DA) algorithm, the theoretical understanding of its convergence behavior remains incomplete. We prove the first non-asymptotic polynomial upper bounds on mixing times of three important DA algorithms: DA algorithm for Bayesian Probit regression (Albert and Chib, 1993, ProbitDA), Bayesian Logit regression (Polson, Scott, and Windle, 2013, LogitDA), and Bayesian Lasso regression (Park and Casella, 2008, Rajaratnam et al., 2015, LassoDA). Concretely, we demonstrate that with -warm start, parameter dimension , and sample size , the ProbitDA and LogitDA require steps to obtain samples with at most TV error, whereas the LassoDA requires steps. The results are generally applicable to settings with large and large , including settings with highly imbalanced response data in the Probit and Logit regression. The proofs are based on the Markov chain conductance and isoperimetric inequalities. Assuming that data are independently generated from either a bounded, sub-Gaussian, or log-concave distribution, we improve the guarantees for ProbitDA and LogitDA to with high probability, and compare it with the best known guarantees of Langevin Monte Carlo and Metropolis Adjusted Langevin Algorithm. We also discuss the mixing times of the three algorithms under feasible initialization.

Paper Structure

This paper contains 43 sections, 24 theorems, 175 equations, 7 figures, 1 table, 4 algorithms.

Key Result

Theorem 3.2

Under Assumption a:bounded-covariance and Assumption a: bounded-entries, we have for any $\eta \ge 1$ and $\epsilon \in (0,1)$, the mixing time of ProbitDA with $\eta$-warm start and $\epsilon$-error tolerance satisfies where $c$ is a universal constant.

Figures (7)

  • Figure 1: Illustration of the transition kernels of ProbitDA, LogitDA, and LassoDA. Here, the arrow represents conditional dependency.
  • Figure 2: Simulation results for ProbitDA with imbalance factor $\Upsilon=0.6$.
  • Figure 3: Simulation results for ProbitDA with imbalance factor $\Upsilon=1$.
  • Figure 4: Simulation results for LogitDA with imbalance factor $\Upsilon=0.6$.
  • Figure 5: Simulation results for LogitDA with imbalance factor $\Upsilon=1$.
  • ...and 2 more figures

Theorems & Definitions (39)

  • Theorem 3.2
  • Theorem 3.3
  • Theorem 3.4
  • Theorem 3.6
  • Lemma 5.1
  • Lemma 5.2: milman2012properties
  • Lemma 5.3: Modified Version of lovasz1993random
  • Remark
  • Lemma 5.4: chewi2023log and dwivedi2019log
  • Lemma 5.5: chen2020fast
  • ...and 29 more