Table of Contents
Fetching ...

CoCoA Is ADMM: Unifying Two Paradigms in Distributed Optimization

Runxiong Wu, Dong Liu, Xueqin Wang, Andi Wang

TL;DR

The paper tackles distributed ERM across $K$ machines by examining two main algorithm families: CoCoA and ADMM variants. It reveals that these methods can be cast into a single primal-dual update form via a saddle-point reformulation, with the server updating the primal variable $w$ and each node updating its local dual block $v_{[k]}$, while data privacy and communication are managed through encoders. A key finding is that CoCoA is a special case of proximal ADMM on the dual (and consensus ADMM is equivalent to proximal ADMM), and linearized ADMM variants extend the framework with closed-form proximal steps; tuning the augmented-Lagrangian parameter crucially affects performance. The paper provides a unified convergence analysis with $O(1/T)$ ergodic rates and validates the theory with extensive experiments showing that ADMM variants typically outperform CoCoA under appropriate parameter settings, illustrating a versatile and scalable approach to distributed learning.

Abstract

We consider primal-dual algorithms for general empirical risk minimization problems in distributed settings, focusing on two prominent classes of algorithms. The first class is the communication-efficient distributed dual coordinate ascent (CoCoA), derived from the coordinate ascent method for solving the dual problem. The second class is the alternating direction method of multipliers (ADMM), including consensus ADMM, proximal ADMM, and linearized ADMM. We demonstrate that both classes of algorithms can be transformed into a unified update form that involves only primal and dual variables. This discovery reveals key connections between the two classes of algorithms: CoCoA can be interpreted as a special case of proximal ADMM for solving the dual problem, while consensus ADMM is equivalent to a proximal ADMM algorithm. This discovery provides insight into how we can easily enable the ADMM variants to outperform the CoCoA variants by adjusting the augmented Lagrangian parameter. We further explore linearized versions of ADMM and analyze the effects of tuning parameters on these ADMM variants in the distributed setting. Extensive simulation studies and real-world data analysis support our theoretical findings.

CoCoA Is ADMM: Unifying Two Paradigms in Distributed Optimization

TL;DR

The paper tackles distributed ERM across machines by examining two main algorithm families: CoCoA and ADMM variants. It reveals that these methods can be cast into a single primal-dual update form via a saddle-point reformulation, with the server updating the primal variable and each node updating its local dual block , while data privacy and communication are managed through encoders. A key finding is that CoCoA is a special case of proximal ADMM on the dual (and consensus ADMM is equivalent to proximal ADMM), and linearized ADMM variants extend the framework with closed-form proximal steps; tuning the augmented-Lagrangian parameter crucially affects performance. The paper provides a unified convergence analysis with ergodic rates and validates the theory with extensive experiments showing that ADMM variants typically outperform CoCoA under appropriate parameter settings, illustrating a versatile and scalable approach to distributed learning.

Abstract

We consider primal-dual algorithms for general empirical risk minimization problems in distributed settings, focusing on two prominent classes of algorithms. The first class is the communication-efficient distributed dual coordinate ascent (CoCoA), derived from the coordinate ascent method for solving the dual problem. The second class is the alternating direction method of multipliers (ADMM), including consensus ADMM, proximal ADMM, and linearized ADMM. We demonstrate that both classes of algorithms can be transformed into a unified update form that involves only primal and dual variables. This discovery reveals key connections between the two classes of algorithms: CoCoA can be interpreted as a special case of proximal ADMM for solving the dual problem, while consensus ADMM is equivalent to a proximal ADMM algorithm. This discovery provides insight into how we can easily enable the ADMM variants to outperform the CoCoA variants by adjusting the augmented Lagrangian parameter. We further explore linearized versions of ADMM and analyze the effects of tuning parameters on these ADMM variants in the distributed setting. Extensive simulation studies and real-world data analysis support our theoretical findings.

Paper Structure

This paper contains 36 sections, 8 theorems, 77 equations, 4 figures, 1 table.

Key Result

Proposition 1

The consensus ADMM with regularization for solving the primal problem primal is equivalent to the following update rule:

Figures (4)

  • Figure 1: Connections among distributed algorithms: under $\ell_2$-regularized ERM, CoCoA is equivalent to first Proximal ADMM with $\rho = \lambda^{-1}$, and Consensus ADMM is equivalent to first Proximal ADMM when $\beta K = \rho^{-1}$. Linearized consensus ADMM is equivalent to the linearized proximal ADMM.
  • Figure 2: Effect of tuning parameters on various distributed algorithms in Experiment 1.
  • Figure 3: Relative gap difference versus the number of communication rounds for various synthetic datasets when using different update rules in Experiment 2.
  • Figure 4: Relative gap differences versus the number of communication rounds for various real datasets across different models. The first row of plots illustrates the results for SVM with a ridge penalty across different datasets, while the second row shows the results for SVM with a lasso penalty across the same datasets.

Theorems & Definitions (17)

  • Proposition 1
  • Proposition 2
  • Lemma 1
  • Corollary 1: Equivalence of CoCoA-PD and Proximal-1-PD
  • Corollary 2: Equivalence of Consensus ADMM and Proximal ADMM
  • Lemma 2
  • Theorem 1
  • Theorem 2
  • proof
  • proof
  • ...and 7 more