Table of Contents
Fetching ...

FAGH: Accelerating Federated Learning with Approximated Global Hessian

Mrinmay Sen, A. K. Qin, Krishna Mohan C

TL;DR

The paper addresses the high communication burden in federated learning due to slow convergence by introducing FAGH, a Newton-method–based approach that avoids full Hessian computation. FAGH builds an approximated global Hessian from the first row of local Hessians and uses Adam-like moment estimates to form a global Newton direction on the server via the Sherman–Morrison formula, enabling efficient updates with reduced memory and communication. The method shows faster convergence and fewer communication rounds than several state-of-the-art FL methods on heterogeneous data partitions, including CIFAR10, FashionMNIST, and EMNIST. The results suggest FAGH offers practical, resource-efficient acceleration for second-order FL and opens avenues for adaptive, privacy-preserving client selection in future work.

Abstract

In federated learning (FL), the significant communication overhead due to the slow convergence speed of training the global model poses a great challenge. Specifically, a large number of communication rounds are required to achieve the convergence in FL. One potential solution is to employ the Newton-based optimization method for training, known for its quadratic convergence rate. However, the existing Newton-based FL training methods suffer from either memory inefficiency or high computational costs for local clients or the server. To address this issue, we propose an FL with approximated global Hessian (FAGH) method to accelerate FL training. FAGH leverages the first moment of the approximated global Hessian and the first moment of the global gradient to train the global model. By harnessing the approximated global Hessian curvature, FAGH accelerates the convergence of global model training, leading to the reduced number of communication rounds and thus the shortened training time. Experimental results verify FAGH's effectiveness in decreasing the number of communication rounds and the time required to achieve the pre-specified objectives of the global model performance in terms of training and test losses as well as test accuracy. Notably, FAGH outperforms several state-of-the-art FL training methods.

FAGH: Accelerating Federated Learning with Approximated Global Hessian

TL;DR

The paper addresses the high communication burden in federated learning due to slow convergence by introducing FAGH, a Newton-method–based approach that avoids full Hessian computation. FAGH builds an approximated global Hessian from the first row of local Hessians and uses Adam-like moment estimates to form a global Newton direction on the server via the Sherman–Morrison formula, enabling efficient updates with reduced memory and communication. The method shows faster convergence and fewer communication rounds than several state-of-the-art FL methods on heterogeneous data partitions, including CIFAR10, FashionMNIST, and EMNIST. The results suggest FAGH offers practical, resource-efficient acceleration for second-order FL and opens avenues for adaptive, privacy-preserving client selection in future work.

Abstract

In federated learning (FL), the significant communication overhead due to the slow convergence speed of training the global model poses a great challenge. Specifically, a large number of communication rounds are required to achieve the convergence in FL. One potential solution is to employ the Newton-based optimization method for training, known for its quadratic convergence rate. However, the existing Newton-based FL training methods suffer from either memory inefficiency or high computational costs for local clients or the server. To address this issue, we propose an FL with approximated global Hessian (FAGH) method to accelerate FL training. FAGH leverages the first moment of the approximated global Hessian and the first moment of the global gradient to train the global model. By harnessing the approximated global Hessian curvature, FAGH accelerates the convergence of global model training, leading to the reduced number of communication rounds and thus the shortened training time. Experimental results verify FAGH's effectiveness in decreasing the number of communication rounds and the time required to achieve the pre-specified objectives of the global model performance in terms of training and test losses as well as test accuracy. Notably, FAGH outperforms several state-of-the-art FL training methods.
Paper Structure (14 sections, 7 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 14 sections, 7 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Comparisons of training loss, test loss and test accuracy on CIFAR10 image classification using LeNet5
  • Figure 2: Comparisons of training loss, test loss and test accuracy on FashionMNIST image classification using CNN
  • Figure 3: Comparisons of training loss, test loss and test accuracy on EMNIST image classification using MLR
  • Figure 4: Time comparisons of training loss, test loss and test accuracy on CIFAR10 image classification using LeNet5
  • Figure 5: Time comparisons of training loss, test loss and test accuracy on FashionMNIST image classification using CNN
  • ...and 1 more figures