Table of Contents
Fetching ...

Distributed optimization: designed for federated learning

Wenyou Guo, Ting Qu, Chunrong Pan, George Q. Huang

TL;DR

This work addresses large-scale federated learning under privacy and statistical heterogeneity constraints by formulating a consensus optimization problem and introducing Fed-DALD, a distributed augmented Lagrangian framework that supports both centralized and decentralized communication topologies. The method unifies proximal relaxation and second-order approximations to construct surrogate objectives, from which classical optimization methods such as PA, GD, SGD, MBGD, and DGD emerge as special cases, while providing convergence guarantees even under inexact subproblem solves. Theoretical results establish convergence of the inner DALD loop to stationary points and demonstrate accelerated, inexact variants that preserve convergence with diminishing dual residuals and increasing penalties. Numerical experiments on IID regression and non-IID MNIST classification show that Fed-DALD-DC/CC outperform FedProx, particularly in large-scale heterogeneous settings, with improved stability and scalability. Overall, the framework offers a principled, topology-agnostic approach to FL that bridges monolithic and distributed optimization theories and supports robust, privacy-preserving learning across diverse networks.

Abstract

Federated learning (FL), as a distributed collaborative machine learning (ML) framework under privacy-preserving constraints, has garnered increasing research attention in cross-organizational data collaboration scenarios. This paper proposes a class of distributed optimization algorithms based on the augmented Lagrangian technique, designed to accommodate diverse communication topologies in both centralized and decentralized FL settings. Furthermore, we develop multiple termination criteria and parameter update mechanisms to enhance computational efficiency, accompanied by rigorous theoretical guarantees of convergence. By generalizing the augmented Lagrangian relaxation through the incorporation of proximal relaxation and quadratic approximation, our framework systematically recovers a broad of classical unconstrained optimization methods, including proximal algorithm, classic gradient descent, and stochastic gradient descent, among others. Notably, the convergence properties of these methods can be naturally derived within the proposed theoretical framework. Numerical experiments demonstrate that the proposed algorithm exhibits strong performance in large-scale settings with significant statistical heterogeneity across clients.

Distributed optimization: designed for federated learning

TL;DR

This work addresses large-scale federated learning under privacy and statistical heterogeneity constraints by formulating a consensus optimization problem and introducing Fed-DALD, a distributed augmented Lagrangian framework that supports both centralized and decentralized communication topologies. The method unifies proximal relaxation and second-order approximations to construct surrogate objectives, from which classical optimization methods such as PA, GD, SGD, MBGD, and DGD emerge as special cases, while providing convergence guarantees even under inexact subproblem solves. Theoretical results establish convergence of the inner DALD loop to stationary points and demonstrate accelerated, inexact variants that preserve convergence with diminishing dual residuals and increasing penalties. Numerical experiments on IID regression and non-IID MNIST classification show that Fed-DALD-DC/CC outperform FedProx, particularly in large-scale heterogeneous settings, with improved stability and scalability. Overall, the framework offers a principled, topology-agnostic approach to FL that bridges monolithic and distributed optimization theories and supports robust, privacy-preserving learning across diverse networks.

Abstract

Federated learning (FL), as a distributed collaborative machine learning (ML) framework under privacy-preserving constraints, has garnered increasing research attention in cross-organizational data collaboration scenarios. This paper proposes a class of distributed optimization algorithms based on the augmented Lagrangian technique, designed to accommodate diverse communication topologies in both centralized and decentralized FL settings. Furthermore, we develop multiple termination criteria and parameter update mechanisms to enhance computational efficiency, accompanied by rigorous theoretical guarantees of convergence. By generalizing the augmented Lagrangian relaxation through the incorporation of proximal relaxation and quadratic approximation, our framework systematically recovers a broad of classical unconstrained optimization methods, including proximal algorithm, classic gradient descent, and stochastic gradient descent, among others. Notably, the convergence properties of these methods can be naturally derived within the proposed theoretical framework. Numerical experiments demonstrate that the proposed algorithm exhibits strong performance in large-scale settings with significant statistical heterogeneity across clients.

Paper Structure

This paper contains 25 sections, 114 equations, 5 figures, 2 tables, 3 algorithms.

Figures (5)

  • Figure 1: Communication Topologies of Centralized and Decentralized Federated Learning
  • Figure 2: Hierarchical Networks for Client Coordination Sequences
  • Figure 3: Hierarchical Matrices for Client Coordination Sequences
  • Figure 4: Geometric View of the Proximal Framework
  • Figure 5: Performance Comparison of Algorithms at Varying Client Scales with $\lambda=10^{-3}$