A General Continuous-Time Formulation of Stochastic ADMM and Its Variants
Chris Junchi Li
TL;DR
The paper develops a unified continuous-time analysis for a broad class of stochastic ADMM algorithms (G-sADMM), showing that under large rho and proper scaling, the iterates converge in a weak sense to a stochastic differential equation with diffusion matching preconditioned SGD. The analysis reveals that the relaxation parameter alpha must lie in (0,2) to ensure residual convergence, and it characterizes how fluctuations scale as rho^{-1/2}. The results unify standard, linearized, and gradient-based ADMM variants, quantify the drift-diffusion balance via a diffusion term and matrix \widehat{M}, and provide practical guidance for parameter selection, including adaptive strategies. Numerical experiments on toy problems and generalized ridge/lasso regression validate the SME as a faithful proxy for both mean behavior and stochastic fluctuations of G-sADMM, and illustrate the effects of alpha, c, and batch size on convergence and variance. This continuous-time perspective offers a principled framework to understand and optimize stochastic ADMM methods in large-scale settings.
Abstract
Stochastic versions of the alternating direction method of multiplier (ADMM) and its variants play a key role in many modern large-scale machine learning problems. In this work, we introduce a unified algorithmic framework called generalized stochastic ADMM and investigate their continuous-time analysis. The generalized framework widely includes many stochastic ADMM variants such as standard, linearized and gradient-based ADMM. Our continuous-time analysis provides us with new insights into stochastic ADMM and variants, and we rigorously prove that under some proper scaling, the trajectory of stochastic ADMM weakly converges to the solution of a stochastic differential equation with small noise. Our analysis also provides a theoretical explanation of why the relaxation parameter should be chosen between 0 and 2.
