Table of Contents
Fetching ...

Hierarchical Federated ADMM

Seyed Mohammad Azimi-Abarghouyi, Nicola Bastianello, Karl H. Johansson, Viktoria Fodor

TL;DR

This work replaces gradient-descent-based hierarchical FL with an ADMM-based top-layer framework, introducing two algorithms: HierFADMM, which uses ADMM at the top layer and gradient-descent updates at the bottom, and HierF2ADMM, which uses ADMM on both layers. The authors derive the cloud aggregation and lower-layer update rules, show privacy benefits from sharing linear combinations of parameters, and prove convergence of the inexact ADMM formulations as inner iterations grow. Experiments on logistic regression with the Adult dataset demonstrate improved convergence and accuracy for the ADMM-based methods, especially under non-i.i.d. data, with HierF2ADMM offering additional privacy gains. The results establish a modular hierarchical FL approach that can integrate alternative optimization methods at either layer, with practical implications for scalable and privacy-aware distributed learning.

Abstract

In this paper, we depart from the widely-used gradient descent-based hierarchical federated learning (FL) algorithms to develop a novel hierarchical FL framework based on the alternating direction method of multipliers (ADMM). Within this framework, we propose two novel FL algorithms, which both use ADMM in the top layer: one that employs ADMM in the lower layer and another that uses the conventional gradient descent-based approach. The proposed framework enhances privacy, and experiments demonstrate the superiority of the proposed algorithms compared to the conventional algorithms in terms of learning convergence and accuracy. Additionally, gradient descent on the lower layer performs well even if the number of local steps is very limited, while ADMM on both layers lead to better performance otherwise.

Hierarchical Federated ADMM

TL;DR

This work replaces gradient-descent-based hierarchical FL with an ADMM-based top-layer framework, introducing two algorithms: HierFADMM, which uses ADMM at the top layer and gradient-descent updates at the bottom, and HierF2ADMM, which uses ADMM on both layers. The authors derive the cloud aggregation and lower-layer update rules, show privacy benefits from sharing linear combinations of parameters, and prove convergence of the inexact ADMM formulations as inner iterations grow. Experiments on logistic regression with the Adult dataset demonstrate improved convergence and accuracy for the ADMM-based methods, especially under non-i.i.d. data, with HierF2ADMM offering additional privacy gains. The results establish a modular hierarchical FL approach that can integrate alternative optimization methods at either layer, with practical implications for scalable and privacy-aware distributed learning.

Abstract

In this paper, we depart from the widely-used gradient descent-based hierarchical federated learning (FL) algorithms to develop a novel hierarchical FL framework based on the alternating direction method of multipliers (ADMM). Within this framework, we propose two novel FL algorithms, which both use ADMM in the top layer: one that employs ADMM in the lower layer and another that uses the conventional gradient descent-based approach. The proposed framework enhances privacy, and experiments demonstrate the superiority of the proposed algorithms compared to the conventional algorithms in terms of learning convergence and accuracy. Additionally, gradient descent on the lower layer performs well even if the number of local steps is very limited, while ADMM on both layers lead to better performance otherwise.
Paper Structure (13 sections, 1 theorem, 25 equations, 5 figures, 1 table, 2 algorithms)

This paper contains 13 sections, 1 theorem, 25 equations, 5 figures, 1 table, 2 algorithms.

Key Result

Proposition 1

Assume $f_{kc}$ in localloss to be closed, proper and convex for all $k, c$. Consider a version of HierFADMM or HierF2ADMM in which the number of intra-set iterations ($\tau^t$) changes over time according to $\lim_{t \to \infty} \tau^t = \infty$. Then $\mathbf{w}^t$ and $\mathbf{w}_c^t$, $\forall c

Figures (5)

  • Figure 1: Modular schematic of FL algorithms
  • Figure 2: Objective as a function of global iterations ($L=1$, i.i.d.)
  • Figure 3: Objective as a function of global iterations ($L=4$, i.i.d.)
  • Figure 4: Objective as a function of global iterations ($L=4$, i.i.d.)
  • Figure 5: Objective as a function of global iterations ($L=4$, non-i.i.d.)

Theorems & Definitions (1)

  • Proposition 1