Synthetic data shuffling accelerates the convergence of federated learning under data heterogeneity
Bo Li, Yasin Esfandiari, Mikkel N. Schmidt, Tommy S. Alstrøm, Sebastian U. Stich
TL;DR
The paper addresses convergence challenges in heterogeneous federated learning by establishing a quantitative link between data shuffling and optimization speed. It shows that shuffling a fraction $p$ of data across clients quadratically reduces gradient dissimilarity, enabling faster convergence, and provides convergence-rate bounds for strongly convex and non-convex objectives. Building on this theory, the authors propose Fedssyn, a practical framework that uses locally trained synthetic data generators to produce shuffled synthetic data, preserving data access rights and offering differential privacy options. Empirically, Fedssyn yields substantial reductions in communication rounds and improvements in accuracy across CIFAR-10/100 and varying participation, with DP variants demonstrating privacy-preserving viability. The approach thus offers a principled, privacy-aware pathway to close the performance gap caused by data heterogeneity in FL.
Abstract
In federated learning, data heterogeneity is a critical challenge. A straightforward solution is to shuffle the clients' data to homogenize the distribution. However, this may violate data access rights, and how and when shuffling can accelerate the convergence of a federated optimization algorithm is not theoretically well understood. In this paper, we establish a precise and quantifiable correspondence between data heterogeneity and parameters in the convergence rate when a fraction of data is shuffled across clients. We prove that shuffling can quadratically reduce the gradient dissimilarity with respect to the shuffling percentage, accelerating convergence. Inspired by the theory, we propose a practical approach that addresses the data access rights issue by shuffling locally generated synthetic data. The experimental results show that shuffling synthetic data improves the performance of multiple existing federated learning algorithms by a large margin.
