Table of Contents
Fetching ...

Fed-Sophia: A Communication-Efficient Second-Order Federated Learning Algorithm

Ahmed Elbakary, Chaouki Ben Issaid, Mohammad Shehab, Karim Seddik, Tamer ElBatt, Mehdi Bennis

TL;DR

Fed-Sophia addresses the challenge of incorporating curvature information in federated learning for large models by using a diagonal Hessian estimate as a preconditioner within a second-order framework. It adopts Gauss-Newton-Bartlett diagonal estimation, combines it with an exponential moving average of the gradient, and employs an adaptive, per-dimension step size with a clipping mechanism to guard against misleading curvature. A periodic update of the diagonal Hessian and sharing only parameter vectors enable scalable convergence under non-IID data while maintaining communication efficiency. Empirical results on MNIST and Fashion-MNIST show that Fed-Sophia outperforms first-order FedAvg and several second-order baselines in convergence speed, robustness, and energy footprint, highlighting its practical potential for deploying second-order FL in real-world settings.

Abstract

Federated learning is a machine learning approach where multiple devices collaboratively learn with the help of a parameter server by sharing only their local updates. While gradient-based optimization techniques are widely adopted in this domain, the curvature information that second-order methods exhibit is crucial to guide and speed up the convergence. This paper introduces a scalable second-order method, allowing the adoption of curvature information in federated large models. Our method, coined Fed-Sophia, combines a weighted moving average of the gradient with a clipping operation to find the descent direction. In addition to that, a lightweight estimation of the Hessian's diagonal is used to incorporate the curvature information. Numerical evaluation shows the superiority, robustness, and scalability of the proposed Fed-Sophia scheme compared to first and second-order baselines.

Fed-Sophia: A Communication-Efficient Second-Order Federated Learning Algorithm

TL;DR

Fed-Sophia addresses the challenge of incorporating curvature information in federated learning for large models by using a diagonal Hessian estimate as a preconditioner within a second-order framework. It adopts Gauss-Newton-Bartlett diagonal estimation, combines it with an exponential moving average of the gradient, and employs an adaptive, per-dimension step size with a clipping mechanism to guard against misleading curvature. A periodic update of the diagonal Hessian and sharing only parameter vectors enable scalable convergence under non-IID data while maintaining communication efficiency. Empirical results on MNIST and Fashion-MNIST show that Fed-Sophia outperforms first-order FedAvg and several second-order baselines in convergence speed, robustness, and energy footprint, highlighting its practical potential for deploying second-order FL in real-world settings.

Abstract

Federated learning is a machine learning approach where multiple devices collaboratively learn with the help of a parameter server by sharing only their local updates. While gradient-based optimization techniques are widely adopted in this domain, the curvature information that second-order methods exhibit is crucial to guide and speed up the convergence. This paper introduces a scalable second-order method, allowing the adoption of curvature information in federated large models. Our method, coined Fed-Sophia, combines a weighted moving average of the gradient with a clipping operation to find the descent direction. In addition to that, a lightweight estimation of the Hessian's diagonal is used to incorporate the curvature information. Numerical evaluation shows the superiority, robustness, and scalability of the proposed Fed-Sophia scheme compared to first and second-order baselines.
Paper Structure (16 sections, 14 equations, 3 figures, 2 tables, 2 algorithms)

This paper contains 16 sections, 14 equations, 3 figures, 2 tables, 2 algorithms.

Figures (3)

  • Figure 1: The effect of the Hessian-based step size against the gradient-based method.
  • Figure 2: Test accuracy for Fed-Sophia against other baselines in terms of communication rounds for MNIST/FMNIST datasets using MLP and CNN models.
  • Figure 3: Test accuracy for Fed-Sophia against other baselines in terms of the number of total iterations for MNIST and FMNIST datasets using MLP.