Distributed optimization: designed for federated learning
Wenyou Guo, Ting Qu, Chunrong Pan, George Q. Huang
TL;DR
This work addresses large-scale federated learning under privacy and statistical heterogeneity constraints by formulating a consensus optimization problem and introducing Fed-DALD, a distributed augmented Lagrangian framework that supports both centralized and decentralized communication topologies. The method unifies proximal relaxation and second-order approximations to construct surrogate objectives, from which classical optimization methods such as PA, GD, SGD, MBGD, and DGD emerge as special cases, while providing convergence guarantees even under inexact subproblem solves. Theoretical results establish convergence of the inner DALD loop to stationary points and demonstrate accelerated, inexact variants that preserve convergence with diminishing dual residuals and increasing penalties. Numerical experiments on IID regression and non-IID MNIST classification show that Fed-DALD-DC/CC outperform FedProx, particularly in large-scale heterogeneous settings, with improved stability and scalability. Overall, the framework offers a principled, topology-agnostic approach to FL that bridges monolithic and distributed optimization theories and supports robust, privacy-preserving learning across diverse networks.
Abstract
Federated learning (FL), as a distributed collaborative machine learning (ML) framework under privacy-preserving constraints, has garnered increasing research attention in cross-organizational data collaboration scenarios. This paper proposes a class of distributed optimization algorithms based on the augmented Lagrangian technique, designed to accommodate diverse communication topologies in both centralized and decentralized FL settings. Furthermore, we develop multiple termination criteria and parameter update mechanisms to enhance computational efficiency, accompanied by rigorous theoretical guarantees of convergence. By generalizing the augmented Lagrangian relaxation through the incorporation of proximal relaxation and quadratic approximation, our framework systematically recovers a broad of classical unconstrained optimization methods, including proximal algorithm, classic gradient descent, and stochastic gradient descent, among others. Notably, the convergence properties of these methods can be naturally derived within the proposed theoretical framework. Numerical experiments demonstrate that the proposed algorithm exhibits strong performance in large-scale settings with significant statistical heterogeneity across clients.
