CoCoA Is ADMM: Unifying Two Paradigms in Distributed Optimization
Runxiong Wu, Dong Liu, Xueqin Wang, Andi Wang
TL;DR
The paper tackles distributed ERM across $K$ machines by examining two main algorithm families: CoCoA and ADMM variants. It reveals that these methods can be cast into a single primal-dual update form via a saddle-point reformulation, with the server updating the primal variable $w$ and each node updating its local dual block $v_{[k]}$, while data privacy and communication are managed through encoders. A key finding is that CoCoA is a special case of proximal ADMM on the dual (and consensus ADMM is equivalent to proximal ADMM), and linearized ADMM variants extend the framework with closed-form proximal steps; tuning the augmented-Lagrangian parameter crucially affects performance. The paper provides a unified convergence analysis with $O(1/T)$ ergodic rates and validates the theory with extensive experiments showing that ADMM variants typically outperform CoCoA under appropriate parameter settings, illustrating a versatile and scalable approach to distributed learning.
Abstract
We consider primal-dual algorithms for general empirical risk minimization problems in distributed settings, focusing on two prominent classes of algorithms. The first class is the communication-efficient distributed dual coordinate ascent (CoCoA), derived from the coordinate ascent method for solving the dual problem. The second class is the alternating direction method of multipliers (ADMM), including consensus ADMM, proximal ADMM, and linearized ADMM. We demonstrate that both classes of algorithms can be transformed into a unified update form that involves only primal and dual variables. This discovery reveals key connections between the two classes of algorithms: CoCoA can be interpreted as a special case of proximal ADMM for solving the dual problem, while consensus ADMM is equivalent to a proximal ADMM algorithm. This discovery provides insight into how we can easily enable the ADMM variants to outperform the CoCoA variants by adjusting the augmented Lagrangian parameter. We further explore linearized versions of ADMM and analyze the effects of tuning parameters on these ADMM variants in the distributed setting. Extensive simulation studies and real-world data analysis support our theoretical findings.
