Fed-Sophia: A Communication-Efficient Second-Order Federated Learning Algorithm
Ahmed Elbakary, Chaouki Ben Issaid, Mohammad Shehab, Karim Seddik, Tamer ElBatt, Mehdi Bennis
TL;DR
Fed-Sophia addresses the challenge of incorporating curvature information in federated learning for large models by using a diagonal Hessian estimate as a preconditioner within a second-order framework. It adopts Gauss-Newton-Bartlett diagonal estimation, combines it with an exponential moving average of the gradient, and employs an adaptive, per-dimension step size with a clipping mechanism to guard against misleading curvature. A periodic update of the diagonal Hessian and sharing only parameter vectors enable scalable convergence under non-IID data while maintaining communication efficiency. Empirical results on MNIST and Fashion-MNIST show that Fed-Sophia outperforms first-order FedAvg and several second-order baselines in convergence speed, robustness, and energy footprint, highlighting its practical potential for deploying second-order FL in real-world settings.
Abstract
Federated learning is a machine learning approach where multiple devices collaboratively learn with the help of a parameter server by sharing only their local updates. While gradient-based optimization techniques are widely adopted in this domain, the curvature information that second-order methods exhibit is crucial to guide and speed up the convergence. This paper introduces a scalable second-order method, allowing the adoption of curvature information in federated large models. Our method, coined Fed-Sophia, combines a weighted moving average of the gradient with a clipping operation to find the descent direction. In addition to that, a lightweight estimation of the Hessian's diagonal is used to incorporate the curvature information. Numerical evaluation shows the superiority, robustness, and scalability of the proposed Fed-Sophia scheme compared to first and second-order baselines.
