Table of Contents
Fetching ...

Scalable and Resource-Efficient Second-Order Federated Learning via Over-the-Air Aggregation

Abdulmomen Ghalkha, Chaouki Ben Issaid, Mehdi Bennis

TL;DR

This work tackles the high computation and communication costs of second-order federated learning by introducing Fed-Sophia, a diagonal Gauss-Newton-Bartlett Hessian-based approach with EMA and stability clipping that avoids full Hessian storage. It further extends to Analog Over-The-Air Fed-Sophia, which exploits wireless channel superposition to aggregate updates over the air, transmitting only CSI-filtered entries under a power-constrained framework and with channel-aware scheduling. The combination yields scalable, energy-efficient second-order FL capable of handling large models, as demonstrated by substantial reductions in communication uploads, latency, and energy consumption across MNIST, Sent140, CIFAR-10, and CIFAR-100 while maintaining or improving accuracy, and showing robustness to non-IID data. The results indicate that OTA aggregation can make second-order FL practical for large-scale deployments, offering strong gains in both efficiency and scalability with privacy-preserving updates.

Abstract

Second-order federated learning (FL) algorithms offer faster convergence than their first-order counterparts by leveraging curvature information. However, they are hindered by high computational and storage costs, particularly for large-scale models. Furthermore, the communication overhead associated with large models and digital transmission exacerbates these challenges, causing communication bottlenecks. In this work, we propose a scalable second-order FL algorithm using a sparse Hessian estimate and leveraging over-the-air aggregation, making it feasible for larger models. Our simulation results demonstrate more than $67\%$ of communication resources and energy savings compared to other first and second-order baselines.

Scalable and Resource-Efficient Second-Order Federated Learning via Over-the-Air Aggregation

TL;DR

This work tackles the high computation and communication costs of second-order federated learning by introducing Fed-Sophia, a diagonal Gauss-Newton-Bartlett Hessian-based approach with EMA and stability clipping that avoids full Hessian storage. It further extends to Analog Over-The-Air Fed-Sophia, which exploits wireless channel superposition to aggregate updates over the air, transmitting only CSI-filtered entries under a power-constrained framework and with channel-aware scheduling. The combination yields scalable, energy-efficient second-order FL capable of handling large models, as demonstrated by substantial reductions in communication uploads, latency, and energy consumption across MNIST, Sent140, CIFAR-10, and CIFAR-100 while maintaining or improving accuracy, and showing robustness to non-IID data. The results indicate that OTA aggregation can make second-order FL practical for large-scale deployments, offering strong gains in both efficiency and scalability with privacy-preserving updates.

Abstract

Second-order federated learning (FL) algorithms offer faster convergence than their first-order counterparts by leveraging curvature information. However, they are hindered by high computational and storage costs, particularly for large-scale models. Furthermore, the communication overhead associated with large models and digital transmission exacerbates these challenges, causing communication bottlenecks. In this work, we propose a scalable second-order FL algorithm using a sparse Hessian estimate and leveraging over-the-air aggregation, making it feasible for larger models. Our simulation results demonstrate more than of communication resources and energy savings compared to other first and second-order baselines.

Paper Structure

This paper contains 10 sections, 12 equations, 1 figure, 2 tables, 1 algorithm.

Figures (1)

  • Figure 1: Test accuracy for Fed-Sophia and OTA Fed-Sophia against other baselines in terms of the number of communication uploads for (a) MNIST dataset using MLP model, (b) Sent140 using LSTM model, (c) CIFAR-10 dataset using CNN model, and (d) Cifar-100 using ResNet architecture subfigure (d).