Convergence Analysis of Federated Learning Methods Using Backward Error Analysis
Jinwoo Lim, Suhyun Kim, Soo-Mook Moon
TL;DR
This work extends backward error analysis to federated learning, deriving a modified loss (implicit regularizer) that governs the actual gradient flow under finite-step updates. The authors derive explicit forms for FedAvg, FedSAM, and SCAFFOLD, revealing a dispersion term that increases gradient variance and biases updates, and a second-order term that can modulate convergence, especially under multiple local epochs. Empirical results on MNIST, FEMNIST, and CIFAR-10 validate the theory: FedAvg suffers from dispersion-induced bias and sharper minima, while FedSAM and SCAFFOLD mitigate dispersion to varying degrees, with high-order terms limiting variance-reduction benefits in complex models. Overall, the implicit-regularizer lens provides a complementary, intuition-rich perspective on convergence dynamics in non-IID federated learning and guides discussions on variance-reduction strategies and their limitations.
Abstract
Backward error analysis allows finding a modified loss function, which the parameter updates really follow under the influence of an optimization method. The additional loss terms included in this modified function is called implicit regularizer. In this paper, we attempt to find the implicit regularizer for various federated learning algorithms on non-IID data distribution, and explain why each method shows different convergence behavior. We first show that the implicit regularizer of FedAvg disperses the gradient of each client from the average gradient, thus increasing the gradient variance. We also empirically show that the implicit regularizer hampers its convergence. Similarly, we compute the implicit regularizers of FedSAM and SCAFFOLD, and explain why they converge better. While existing convergence analyses focus on pointing out the advantages of FedSAM and SCAFFOLD, our approach can explain their limitations in complex non-convex settings. In specific, we demonstrate that FedSAM can partially remove the bias in the first-order term of the implicit regularizer in FedAvg, whereas SCAFFOLD can fully eliminate the bias in the first-order term, but not in the second-order term. Consequently, the implicit regularizer can provide a useful insight on the convergence behavior of federated learning from a different theoretical perspective.
