Non-Convex Optimization in Federated Learning via Variance Reduction and Adaptive Learning
Dipanwita Thakur, Antonella Guzzo, Giancarlo Fortino, Sajal K. Das
TL;DR
This work addresses non-convex federated optimization under non-IID data by introducing a momentum-based variance-reduction framework with adaptive learning rates for both local updates and global aggregation. The method reduces gradient variance and communication rounds, achieving convergence to an epsilon-stationary point with improved $O(epsilon^{-1})$ communication complexity and strong empirical results on MNIST and CIFAR-10. The combination of momentum-based variance reduction and adaptivity mitigates client drift and accelerates convergence without extra per-client storage or communication. The study highlights practical gains for cross-device FL while noting limitations related to client participation assumptions and suggesting future work on cross-silo extensions.
Abstract
This paper proposes a novel federated algorithm that leverages momentum-based variance reduction with adaptive learning to address non-convex settings across heterogeneous data. We intend to minimize communication and computation overhead, thereby fostering a sustainable federated learning system. We aim to overcome challenges related to gradient variance, which hinders the model's efficiency, and the slow convergence resulting from learning rate adjustments with heterogeneous data. The experimental results on the image classification tasks with heterogeneous data reveal the effectiveness of our suggested algorithms in non-convex settings with an improved communication complexity of $\mathcal{O}(ε^{-1})$ to converge to an $ε$-stationary point - compared to the existing communication complexity $\mathcal{O}(ε^{-2})$ of most prior works. The proposed federated version maintains the trade-off between the convergence rate, number of communication rounds, and test accuracy while mitigating the client drift in heterogeneous settings. The experimental results demonstrate the efficiency of our algorithms in image classification tasks (MNIST, CIFAR-10) with heterogeneous data.
