Federated Learning of a Mixture of Global and Local Models
Filip Hanzely, Peter Richtárik
TL;DR
This work reframes federated learning with a mixture objective that jointly learns a global model and personalized local models by optimizing $F(x)=f(x)+\lambda\psi(x)$ where $f(x)=\frac{1}{n}\sum_i f_i(x_i)$ and $\psi(x)=\frac{1}{2n}\sum_i\|x_i-\bar{x}\|^2$. It introduces L2GD, a loopless, non-uniform SGD algorithm that alternates between local GD and averaging, and proves that local steps can improve communication in heterogeneous data regimes by effectively solving a personalized FL objective. The paper further develops variance-reduced variants (L2SGD+ and L2SGD++) that achieve linear convergence to the global optimum with favorable communication complexity, and provides theoretical convergence rates and optimal participation probabilities. Empirical results on LibSVM datasets corroborate the theory, showing variance reduction accelerates convergence and that personalization does not harm performance under data heterogeneity. Overall, the framework clarifies the role of local updates in FL, links personalization to reduced communication, and offers scalable, provably efficient algorithms for mixed global-local training.
Abstract
We propose a new optimization formulation for training federated learning models. The standard formulation has the form of an empirical risk minimization problem constructed to find a single global model trained from the private data stored across all participating devices. In contrast, our formulation seeks an explicit trade-off between this traditional global model and the local models, which can be learned by each device from its own private data without any communication. Further, we develop several efficient variants of SGD (with and without partial participation and with and without variance reduction) for solving the new formulation and prove communication complexity guarantees. Notably, our methods are similar but not identical to federated averaging / local SGD, thus shedding some light on the role of local steps in federated learning. In particular, we are the first to i) show that local steps can improve communication for problems with heterogeneous data, and ii) point out that personalization yields reduced communication complexity.
