Distributed Quasi-Newton Method for Fair and Fast Federated Learning

Shayan Mohajer Hamidi; Linfeng Ye

Distributed Quasi-Newton Method for Fair and Fast Federated Learning

Shayan Mohajer Hamidi, Linfeng Ye

TL;DR

This work addresses fairness in federated learning when employing second-order updates by introducing DQN-Fed, a distributed quasi-Newton framework that enforces descent for all clients while preserving rapid Newton-like convergence. It achieves this through a two-stage design: gradient orthogonalization to span the same subspace as client gradients, followed by a closed-form minimum-norm convex combination to form a shared update direction that aligns local losses with Newton-step progress. The authors prove convergence to Pareto-stationary points and establish a linear-quadratic convergence rate, with explicit bounds showing super-linear behavior under favorable conditions. Empirically, DQN-Fed delivers superior fairness, higher average accuracy, and faster convergence across seven diverse datasets, including CIFAR-10/100, FEMNIST, and Shakespeare, while also offering favorable wall-clock performance among second-order methods. The approach also paves the way for integrating fairness-focused FL with robustness techniques against label noise, as illustrated by potential combinations with methods like FedCorr.

Abstract

Federated learning (FL) is a promising technology that enables edge devices/clients to collaboratively and iteratively train a machine learning model under the coordination of a central server. The most common approach to FL is first-order methods, where clients send their local gradients to the server in each iteration. However, these methods often suffer from slow convergence rates. As a remedy, second-order methods, such as quasi-Newton, can be employed in FL to accelerate its convergence. Unfortunately, similarly to the first-order FL methods, the application of second-order methods in FL can lead to unfair models, achieving high average accuracy while performing poorly on certain clients' local datasets. To tackle this issue, in this paper we introduce a novel second-order FL framework, dubbed \textbf{d}istributed \textbf{q}uasi-\textbf{N}ewton \textbf{fed}erated learning (DQN-Fed). This approach seeks to ensure fairness while leveraging the fast convergence properties of quasi-Newton methods in the FL context. Specifically, DQN-Fed helps the server update the global model in such a way that (i) all local loss functions decrease to promote fairness, and (ii) the rate of change in local loss functions aligns with that of the quasi-Newton method. We prove the convergence of DQN-Fed and demonstrate its \textit{linear-quadratic} convergence rate. Moreover, we validate the efficacy of DQN-Fed across a range of federated datasets, showing that it surpasses state-of-the-art fair FL methods in fairness, average accuracy and convergence speed.

Distributed Quasi-Newton Method for Fair and Fast Federated Learning

TL;DR

Abstract

Paper Structure (49 sections, 13 theorems, 66 equations, 3 figures, 12 tables, 1 algorithm)

This paper contains 49 sections, 13 theorems, 66 equations, 3 figures, 12 tables, 1 algorithm.

Introduction
Related Works
Notation and Preliminaries
Notation
Preliminaries and Definitions
Multi-Objective Minimization for Fairness
Newton-type methods
Motivation and Methodology
Motivation
Methodology
Stage 1, Gradient Orthogonalization
Stage 2, finding optimal weights
DQN-Fed Algorithm
Convergence results
Experiments
...and 34 more sections

Key Result

Lemma 3.2

mukai1980algorithms Any Pareto-optimal solution is Pareto-stationary. On the other hand, if all $\{f_k(\boldsymbol{\theta})\}_{k \in [K]}$'s are convex, then any Pareto-stationary solution is weakly Pareto optimal $\boldsymbol{\theta}^*$ is called a weakly Pareto-optimal solution of eq:minpareto if

Figures (3)

Figure 1: The test accuracy curves Vs. communication rounds for DQN-Fed and the benchmark methods.
Figure 2: The histogram of clients accuracy for models trained via FedAvg, q-FFL and DQN-Fed.
Figure 3: The number of improved clients Vs. communication rounds for DQN-Fed and the benchmark methods.

Theorems & Definitions (22)

Definition 3.1
Lemma 3.2
Theorem 4.1
proof
Theorem 4.2
Theorem 4.3: $E=1$ & local SGD
Theorem 4.4: $E>1$ & local GD
Theorem 4.5: $E=1$ & local GD
Theorem 4.6: Convergence rate of DQN-Fed
proof
...and 12 more

Distributed Quasi-Newton Method for Fair and Fast Federated Learning

TL;DR

Abstract

Distributed Quasi-Newton Method for Fair and Fast Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (22)