Table of Contents
Fetching ...

A General Continuous-Time Formulation of Stochastic ADMM and Its Variants

Chris Junchi Li

TL;DR

The paper develops a unified continuous-time analysis for a broad class of stochastic ADMM algorithms (G-sADMM), showing that under large rho and proper scaling, the iterates converge in a weak sense to a stochastic differential equation with diffusion matching preconditioned SGD. The analysis reveals that the relaxation parameter alpha must lie in (0,2) to ensure residual convergence, and it characterizes how fluctuations scale as rho^{-1/2}. The results unify standard, linearized, and gradient-based ADMM variants, quantify the drift-diffusion balance via a diffusion term and matrix \widehat{M}, and provide practical guidance for parameter selection, including adaptive strategies. Numerical experiments on toy problems and generalized ridge/lasso regression validate the SME as a faithful proxy for both mean behavior and stochastic fluctuations of G-sADMM, and illustrate the effects of alpha, c, and batch size on convergence and variance. This continuous-time perspective offers a principled framework to understand and optimize stochastic ADMM methods in large-scale settings.

Abstract

Stochastic versions of the alternating direction method of multiplier (ADMM) and its variants play a key role in many modern large-scale machine learning problems. In this work, we introduce a unified algorithmic framework called generalized stochastic ADMM and investigate their continuous-time analysis. The generalized framework widely includes many stochastic ADMM variants such as standard, linearized and gradient-based ADMM. Our continuous-time analysis provides us with new insights into stochastic ADMM and variants, and we rigorously prove that under some proper scaling, the trajectory of stochastic ADMM weakly converges to the solution of a stochastic differential equation with small noise. Our analysis also provides a theoretical explanation of why the relaxation parameter should be chosen between 0 and 2.

A General Continuous-Time Formulation of Stochastic ADMM and Its Variants

TL;DR

The paper develops a unified continuous-time analysis for a broad class of stochastic ADMM algorithms (G-sADMM), showing that under large rho and proper scaling, the iterates converge in a weak sense to a stochastic differential equation with diffusion matching preconditioned SGD. The analysis reveals that the relaxation parameter alpha must lie in (0,2) to ensure residual convergence, and it characterizes how fluctuations scale as rho^{-1/2}. The results unify standard, linearized, and gradient-based ADMM variants, quantify the drift-diffusion balance via a diffusion term and matrix \widehat{M}, and provide practical guidance for parameter selection, including adaptive strategies. Numerical experiments on toy problems and generalized ridge/lasso regression validate the SME as a faithful proxy for both mean behavior and stochastic fluctuations of G-sADMM, and illustrate the effects of alpha, c, and batch size on convergence and variance. This continuous-time perspective offers a principled framework to understand and optimize stochastic ADMM methods in large-scale settings.

Abstract

Stochastic versions of the alternating direction method of multiplier (ADMM) and its variants play a key role in many modern large-scale machine learning problems. In this work, we introduce a unified algorithmic framework called generalized stochastic ADMM and investigate their continuous-time analysis. The generalized framework widely includes many stochastic ADMM variants such as standard, linearized and gradient-based ADMM. Our continuous-time analysis provides us with new insights into stochastic ADMM and variants, and we rigorously prove that under some proper scaling, the trajectory of stochastic ADMM weakly converges to the solution of a stochastic differential equation with small noise. Our analysis also provides a theoretical explanation of why the relaxation parameter should be chosen between 0 and 2.
Paper Structure (30 sections, 5 theorems, 60 equations, 13 figures)

This paper contains 30 sections, 5 theorems, 60 equations, 13 figures.

Key Result

Theorem 2

If there exists a constant $K_{0}$ and a function $K_{2}(x) \in \mathcal{F}$, such that the following conditions of the first three moments on the error $\Delta-\bar{\Delta}$: hold for any $i, j, l\in\{1,2, \ldots, d\}$ and any $x \in \mathbb{R}^{d}$, then $\{x_{k}\}$ weakly converges to $\{X_{t}\}$ with the order 1. In light of the above theorem, we will now call eq3.5 the stochastic modified eq

Figures (13)

  • Figure 1: The match between the stochastic ADMM and the SME and the verification of the first-order weak approximation. The result is based on the average of $10^{5}$ independent runs: step size $\epsilon=1 / \rho=2^{-m} T$ and $T=0.5$ in (b). The details can be referred to in Section \ref{['sec5']}.
  • Figure 2: Illustration of the order of the residual $r_{k}$ and the $\alpha$-residual $r_{k}^{\alpha}$ defined in \ref{['eq4.11']}.
  • Figure 3: The match between the stochastic ADMM and the SME and the verification of the first-order weak approximation. The result is based on the average of $10^{5}$ independent runs. The step size $\epsilon=1 / \rho=2^{-m} T$ and $T=0.5$ in (b). The details can be referred to in Section \ref{['sec5']}.
  • Figure 4: The 400 sample trajectories from stochastic ADMM (left) and SME (right).
  • Figure 5: The expectation of the error to true minimizer $x_{k}-x_{*}$ when $\alpha \in(0,2)$ varies. The result is based on the average of 10000 runs of stochastic ADMM sequences.
  • ...and 8 more figures

Theorems & Definitions (10)

  • Definition 1: Weak convergence
  • Theorem 2: Milstein's weak convergence theorem
  • Theorem 3: SME for G-sADMM
  • Remark 4
  • Corollary 5
  • Theorem 6
  • Proposition 7
  • Remark 8
  • Remark 9
  • Remark 10