Fast Mixing of Data Augmentation Algorithms: Bayesian Probit, Logit, and Lasso Regression

Holden Lee; Kexin Zhang

Fast Mixing of Data Augmentation Algorithms: Bayesian Probit, Logit, and Lasso Regression

Holden Lee, Kexin Zhang

TL;DR

The paper advances non-asymptotic understanding of data-augmentation Gibbs samplers for Bayesian probit, logit, and Lasso models by deriving polynomial mixing-time bounds via a conductance/isoperimetry framework. It provides explicit rates: for ProbitDA and LogitDA, $t_{ ext{mix}} = \tilde{O}(nd\;\log\frac{\log\eta}{\varepsilon})$ and, for LassoDA, $t_{ ext{mix}} = O$d^2(d\log d+n\log n)^2\; \log(\frac{\eta}{\varepsilon})$ with improvements to $\tilde{O}(n+d)$ under independent bounded or log-concave data; feasible-start variants yield additional refinements. The approach reduces the DA analysis to one-step overlap bounds and employs a conductance-profile enhancement to handle the non-log-concave Lasso target via a log-concave transformation. Numerical experiments corroborate the theoretical bounds, illustrating tight dependence on $n$ and the nuanced interaction between $n$ and $d$ for LassoDA. Overall, the results establish fast, non-asymptotic mixing guarantees for widely used Bayesian DA algorithms and contrast favorably with general-purpose sampling methods in high-dimensional regimes.

Abstract

Despite the widespread use of the data augmentation (DA) algorithm, the theoretical understanding of its convergence behavior remains incomplete. We prove the first non-asymptotic polynomial upper bounds on mixing times of three important DA algorithms: DA algorithm for Bayesian Probit regression (Albert and Chib, 1993, ProbitDA), Bayesian Logit regression (Polson, Scott, and Windle, 2013, LogitDA), and Bayesian Lasso regression (Park and Casella, 2008, Rajaratnam et al., 2015, LassoDA). Concretely, we demonstrate that with $η$-warm start, parameter dimension $d$, and sample size $n$, the ProbitDA and LogitDA require $\mathcal{O}\left(nd\log \left(\frac{\log η}ε\right)\right)$ steps to obtain samples with at most $ε$ TV error, whereas the LassoDA requires $\mathcal{O}\left(d^2(d\log d +n \log n)^2 \log \left(\fracηε\right)\right)$ steps. The results are generally applicable to settings with large $n$ and large $d$, including settings with highly imbalanced response data in the Probit and Logit regression. The proofs are based on the Markov chain conductance and isoperimetric inequalities. Assuming that data are independently generated from either a bounded, sub-Gaussian, or log-concave distribution, we improve the guarantees for ProbitDA and LogitDA to $\tilde{\mathcal{O}}(n+d)$ with high probability, and compare it with the best known guarantees of Langevin Monte Carlo and Metropolis Adjusted Langevin Algorithm. We also discuss the mixing times of the three algorithms under feasible initialization.

Fast Mixing of Data Augmentation Algorithms: Bayesian Probit, Logit, and Lasso Regression

TL;DR

Abstract

Fast Mixing of Data Augmentation Algorithms: Bayesian Probit, Logit, and Lasso Regression

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (39)