Table of Contents
Fetching ...

Federated Learning Using Three-Operator ADMM

Shashi Kant, José Mairton B. da Silva, Gabor Fodor, Bo Göransson, Mats Bengtsson, Carlo Fischione

TL;DR

This paper tackles federated learning with data available both on edge devices and at a server colocated with the base station, introducing a three-operator ADMM framework (TOP-ADMM) to exploit server data alongside device data. It develops FedTOP-ADMM I/II, two variants that generalize FedADMM by incorporating a smooth server loss $h$ and a proximal term $g$, and provides convergence guarantees under general convexity. The approach yields substantial gains in communication efficiency—up to about 33% fewer communication rounds to reach a target accuracy—while maintaining or improving accuracy on real datasets (MNIST, CIFAR-10/100) under both i.i.d. and non-i.i.d. partitions. This enables effective edge learning by leveraging server-side data, with practical impact for 5G/6G edge networks where base stations hold rich data resources.

Abstract

Federated learning (FL) has emerged as an instance of distributed machine learning paradigm that avoids the transmission of data generated on the users' side. Although data are not transmitted, edge devices have to deal with limited communication bandwidths, data heterogeneity, and straggler effects due to the limited computational resources of users' devices. A prominent approach to overcome such difficulties is FedADMM, which is based on the classical two-operator consensus alternating direction method of multipliers (ADMM). The common assumption of FL algorithms, including FedADMM, is that they learn a global model using data only on the users' side and not on the edge server. However, in edge learning, the server is expected to be near the base station and have direct access to rich datasets. In this paper, we argue that leveraging the rich data on the edge server is much more beneficial than utilizing only user datasets. Specifically, we show that the mere application of FL with an additional virtual user node representing the data on the edge server is inefficient. We propose FedTOP-ADMM, which generalizes FedADMM and is based on a three-operator ADMM-type technique that exploits a smooth cost function on the edge server to learn a global model parallel to the edge devices. Our numerical experiments indicate that FedTOP-ADMM has substantial gain up to 33\% in communication efficiency to reach a desired test accuracy with respect to FedADMM, including a virtual user on the edge server.

Federated Learning Using Three-Operator ADMM

TL;DR

This paper tackles federated learning with data available both on edge devices and at a server colocated with the base station, introducing a three-operator ADMM framework (TOP-ADMM) to exploit server data alongside device data. It develops FedTOP-ADMM I/II, two variants that generalize FedADMM by incorporating a smooth server loss and a proximal term , and provides convergence guarantees under general convexity. The approach yields substantial gains in communication efficiency—up to about 33% fewer communication rounds to reach a target accuracy—while maintaining or improving accuracy on real datasets (MNIST, CIFAR-10/100) under both i.i.d. and non-i.i.d. partitions. This enables effective edge learning by leveraging server-side data, with practical impact for 5G/6G edge networks where base stations hold rich data resources.

Abstract

Federated learning (FL) has emerged as an instance of distributed machine learning paradigm that avoids the transmission of data generated on the users' side. Although data are not transmitted, edge devices have to deal with limited communication bandwidths, data heterogeneity, and straggler effects due to the limited computational resources of users' devices. A prominent approach to overcome such difficulties is FedADMM, which is based on the classical two-operator consensus alternating direction method of multipliers (ADMM). The common assumption of FL algorithms, including FedADMM, is that they learn a global model using data only on the users' side and not on the edge server. However, in edge learning, the server is expected to be near the base station and have direct access to rich datasets. In this paper, we argue that leveraging the rich data on the edge server is much more beneficial than utilizing only user datasets. Specifically, we show that the mere application of FL with an additional virtual user node representing the data on the edge server is inefficient. We propose FedTOP-ADMM, which generalizes FedADMM and is based on a three-operator ADMM-type technique that exploits a smooth cost function on the edge server to learn a global model parallel to the edge devices. Our numerical experiments indicate that FedTOP-ADMM has substantial gain up to 33\% in communication efficiency to reach a desired test accuracy with respect to FedADMM, including a virtual user on the edge server.
Paper Structure (21 sections, 10 theorems, 58 equations, 10 figures, 1 table, 1 algorithm)

This paper contains 21 sections, 10 theorems, 58 equations, 10 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Consider a problem given in eqn_chD:general_consensus_top_admm__generic_form with at least one solution and a suitable step-size $\tau \! \in \! \mathbb{R}_{\geq 0}$. Assume subproblems eqn_chD:update_xm__step1_parallel__general_top_admm__prox__for_convergence and eqn_chD:update_z__step2__general_to at any limit point, converges to a KKT stationary point of eqn_chD:general_consensus_top_admm__gene

Figures (10)

  • Figure 1: Illustration of FL architecture, with the new scenario investigated in this paper of a dataset available on the edge server.
  • Figure 2: Comparison of convergence behaviour between FedADMM and FedTOP-ADMM for the distributed sparse logistic regression problem \ref{['eqn_chD:distributed_l1_logistic_regression']}.
  • Figure 3: Examples of MNIST handwritten digits without scaling and with two different scaling approaches.
  • Figure 4: Convergence analysis of existing, FedAvg, FedProx, and FedADMM, and our proposed FedTOP-ADMMI/II algorithms for various hyperparameters under $J\!=\!10$ local iterations.
  • Figure 5: Comparison of FedTOP-ADMMI/II with FedADMM and FedADMM-VC under $J\!=\!10$.
  • ...and 5 more figures

Theorems & Definitions (26)

  • Theorem 1: TOP-ADMM
  • proof
  • Remark 1
  • Remark 2
  • Theorem 2: Global convergence of FedTOP-ADMM algorithm
  • proof
  • Definition 1: $L$-smooth function Bauschke:2011Beck2017
  • Definition 2: SubgradientBauschke:2011
  • Definition 3: Proximal mapping Parikh2013Beck2017
  • Lemma 1
  • ...and 16 more