Table of Contents
Fetching ...

Rethinking the initialization of Momentum in Federated Learning with Heterogeneous Data

Chenguang Xiao, Shuo Wang

TL;DR

The paper addresses the suboptimal momentum initialization in Federated Learning caused by data heterogeneity. It introduces Reversed Momentum Federated Learning (RMFL), which uses a reverse exponential decay to weight gradients when forming the local momentum initialization, mitigating bias from late local updates. Empirical results on MNIST, CIFAR-10, and CIFAR-100 across varying non-IID levels show RMFL consistently improves accuracy and macro F1, with robustness to larger numbers of local epochs. The approach enhances the reliability and efficiency of momentum-based optimization in heterogeneous FL and may inspire further refinements in momentum aggregation for distributed learning.

Abstract

Data Heterogeneity is a major challenge of Federated Learning performance. Recently, momentum based optimization techniques have beed proved to be effective in mitigating the heterogeneity issue. Along with the model updates, the momentum updates are transmitted to the server side and aggregated. Therefore, the local training initialized with a global momentum is guided by the global history of the gradients. However, we spot a problem in the traditional cumulation of the momentum which is suboptimal in the Federated Learning systems. The momentum used to weight less on the historical gradients and more on the recent gradients. This however, will engage more biased local gradients in the end of the local training. In this work, we propose a new way to calculate the estimated momentum used in local initialization. The proposed method is named as Reversed Momentum Federated Learning (RMFL). The key idea is to assign exponentially decayed weights to the gradients with the time going forward, which is on the contrary to the traditional momentum cumulation. The effectiveness of RMFL is evaluated on three popular benchmark datasets with different heterogeneity levels.

Rethinking the initialization of Momentum in Federated Learning with Heterogeneous Data

TL;DR

The paper addresses the suboptimal momentum initialization in Federated Learning caused by data heterogeneity. It introduces Reversed Momentum Federated Learning (RMFL), which uses a reverse exponential decay to weight gradients when forming the local momentum initialization, mitigating bias from late local updates. Empirical results on MNIST, CIFAR-10, and CIFAR-100 across varying non-IID levels show RMFL consistently improves accuracy and macro F1, with robustness to larger numbers of local epochs. The approach enhances the reliability and efficiency of momentum-based optimization in heterogeneous FL and may inspire further refinements in momentum aggregation for distributed learning.

Abstract

Data Heterogeneity is a major challenge of Federated Learning performance. Recently, momentum based optimization techniques have beed proved to be effective in mitigating the heterogeneity issue. Along with the model updates, the momentum updates are transmitted to the server side and aggregated. Therefore, the local training initialized with a global momentum is guided by the global history of the gradients. However, we spot a problem in the traditional cumulation of the momentum which is suboptimal in the Federated Learning systems. The momentum used to weight less on the historical gradients and more on the recent gradients. This however, will engage more biased local gradients in the end of the local training. In this work, we propose a new way to calculate the estimated momentum used in local initialization. The proposed method is named as Reversed Momentum Federated Learning (RMFL). The key idea is to assign exponentially decayed weights to the gradients with the time going forward, which is on the contrary to the traditional momentum cumulation. The effectiveness of RMFL is evaluated on three popular benchmark datasets with different heterogeneity levels.

Paper Structure

This paper contains 12 sections, 3 equations, 27 figures, 1 table.

Figures (27)

  • Figure 1: Stochastic Gradient Descent
  • Figure 2: Stochastic Gradient Descent with Momentum
  • Figure 3: Illustration of the momentum cumulation in FL systems.
  • Figure 4: Box plot of the cumulative momentum in different global iterations.
  • Figure 5: Box plot of the active clients gradient projection length.
  • ...and 22 more figures