Accelerated Training of Federated Learning via Second-Order Methods
Mrinmay Sen, Sidhant R Nair, C Krishna Mohan
TL;DR
This paper surveys second-order optimization methods in Federated Learning (FL) to address slow global model convergence and excessive communication rounds. It categorizes methods by how they leverage Hessian information—ranging from Hessian-free to full Hessian, Nyström, diagonal, and one-rank approaches—and analyzes their empirical and theoretical trade-offs in computation, memory, and transmission cost under varying data heterogeneity and participation. The authors provide a unified taxonomy, benchmark multiple algorithms, and discuss strengths, weaknesses, and practical considerations for scalable FL with curvature information. The work highlights the potential of Hessian-informed FL to substantially reduce communication overhead while noting the significant challenges in Hessian computation, inversion, and privacy-preserving aggregation. Overall, it lays groundwork for developing scalable, efficient second-order federated optimizers capable of handling non-IID data and partial participation in real-world deployments.
Abstract
This paper explores second-order optimization methods in Federated Learning (FL), addressing the critical challenges of slow convergence and the excessive communication rounds required to achieve optimal performance from the global model. While existing surveys in FL primarily focus on challenges related to statistical and device label heterogeneity, as well as privacy and security concerns in first-order FL methods, less attention has been given to the issue of slow model training. This slow training often leads to the need for excessive communication rounds or increased communication costs, particularly when data across clients are highly heterogeneous. In this paper, we examine various FL methods that leverage second-order optimization to accelerate the training process. We provide a comprehensive categorization of state-of-the-art second-order FL methods and compare their performance based on convergence speed, computational cost, memory usage, transmission overhead, and generalization of the global model. Our findings show the potential of incorporating Hessian curvature through second-order optimization into FL and highlight key challenges, such as the efficient utilization of Hessian and its inverse in FL. This work lays the groundwork for future research aimed at developing scalable and efficient federated optimization methods for improving the training of the global model in FL.
