Federated Learning Using Three-Operator ADMM

Shashi Kant; José Mairton B. da Silva; Gabor Fodor; Bo Göransson; Mats Bengtsson; Carlo Fischione

Federated Learning Using Three-Operator ADMM

Shashi Kant, José Mairton B. da Silva, Gabor Fodor, Bo Göransson, Mats Bengtsson, Carlo Fischione

TL;DR

This paper tackles federated learning with data available both on edge devices and at a server colocated with the base station, introducing a three-operator ADMM framework (TOP-ADMM) to exploit server data alongside device data. It develops FedTOP-ADMM I/II, two variants that generalize FedADMM by incorporating a smooth server loss $h$ and a proximal term $g$, and provides convergence guarantees under general convexity. The approach yields substantial gains in communication efficiency—up to about 33% fewer communication rounds to reach a target accuracy—while maintaining or improving accuracy on real datasets (MNIST, CIFAR-10/100) under both i.i.d. and non-i.i.d. partitions. This enables effective edge learning by leveraging server-side data, with practical impact for 5G/6G edge networks where base stations hold rich data resources.

Abstract

Federated learning (FL) has emerged as an instance of distributed machine learning paradigm that avoids the transmission of data generated on the users' side. Although data are not transmitted, edge devices have to deal with limited communication bandwidths, data heterogeneity, and straggler effects due to the limited computational resources of users' devices. A prominent approach to overcome such difficulties is FedADMM, which is based on the classical two-operator consensus alternating direction method of multipliers (ADMM). The common assumption of FL algorithms, including FedADMM, is that they learn a global model using data only on the users' side and not on the edge server. However, in edge learning, the server is expected to be near the base station and have direct access to rich datasets. In this paper, we argue that leveraging the rich data on the edge server is much more beneficial than utilizing only user datasets. Specifically, we show that the mere application of FL with an additional virtual user node representing the data on the edge server is inefficient. We propose FedTOP-ADMM, which generalizes FedADMM and is based on a three-operator ADMM-type technique that exploits a smooth cost function on the edge server to learn a global model parallel to the edge devices. Our numerical experiments indicate that FedTOP-ADMM has substantial gain up to 33\% in communication efficiency to reach a desired test accuracy with respect to FedADMM, including a virtual user on the edge server.

Federated Learning Using Three-Operator ADMM

TL;DR

and a proximal term

, and provides convergence guarantees under general convexity. The approach yields substantial gains in communication efficiency—up to about 33% fewer communication rounds to reach a target accuracy—while maintaining or improving accuracy on real datasets (MNIST, CIFAR-10/100) under both i.i.d. and non-i.i.d. partitions. This enables effective edge learning by leveraging server-side data, with practical impact for 5G/6G edge networks where base stations hold rich data resources.

Abstract

Paper Structure (21 sections, 10 theorems, 58 equations, 10 figures, 1 table, 1 algorithm)

This paper contains 21 sections, 10 theorems, 58 equations, 10 figures, 1 table, 1 algorithm.

Introduction
Motivation for Learning Model on the Edge Server
Contributions
State of the art
Related Works on Federated Learning
Related Works on Operator/Proximal Splitting
Notation and Paper Organization
Introduction to the TOP-ADMM Algorithm
Classical Consensus ADMM
Consensus TOP-ADMM
Federated Learning using TOP-ADMM
FedTOP-ADMM: Communication-Efficient Algorithm
Comparison Among ADMM, FedADMM, TOP-ADMM, and FedTOP-ADMM
Connections with Existing Works on FL
Numerical Results
...and 6 more sections

Key Result

Theorem 1

Consider a problem given in eqn_chD:general_consensus_top_admm__generic_form with at least one solution and a suitable step-size $\tau \! \in \! \mathbb{R}_{\geq 0}$. Assume subproblems eqn_chD:update_xm__step1_parallel__general_top_admm__prox__for_convergence and eqn_chD:update_z__step2__general_to at any limit point, converges to a KKT stationary point of eqn_chD:general_consensus_top_admm__gene

Figures (10)

Figure 1: Illustration of FL architecture, with the new scenario investigated in this paper of a dataset available on the edge server.
Figure 2: Comparison of convergence behaviour between FedADMM and FedTOP-ADMM for the distributed sparse logistic regression problem \ref{['eqn_chD:distributed_l1_logistic_regression']}.
Figure 3: Examples of MNIST handwritten digits without scaling and with two different scaling approaches.
Figure 4: Convergence analysis of existing, FedAvg, FedProx, and FedADMM, and our proposed FedTOP-ADMMI/II algorithms for various hyperparameters under $J\!=\!10$ local iterations.
Figure 5: Comparison of FedTOP-ADMMI/II with FedADMM and FedADMM-VC under $J\!=\!10$.
...and 5 more figures

Theorems & Definitions (26)

Theorem 1: TOP-ADMM
proof
Remark 1
Remark 2
Theorem 2: Global convergence of FedTOP-ADMM algorithm
proof
Definition 1: $L$-smooth function Bauschke:2011Beck2017
Definition 2: SubgradientBauschke:2011
Definition 3: Proximal mapping Parikh2013Beck2017
Lemma 1
...and 16 more

Federated Learning Using Three-Operator ADMM

TL;DR

Abstract

Federated Learning Using Three-Operator ADMM

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (26)