Lessons from Generalization Error Analysis of Federated Learning: You May Communicate Less Often!

Milad Sefidgaran; Romain Chor; Abdellatif Zaidi; Yijun Wan

Lessons from Generalization Error Analysis of Federated Learning: You May Communicate Less Often!

Milad Sefidgaran, Romain Chor, Abdellatif Zaidi, Yijun Wan

TL;DR

This work analyzes how the generalization error in Federated Learning evolves as the number of communication rounds R increases, introducing a formal multi-round FL model with K clients and a parameter server. It develops both PAC-Bayes and rate-distortion bounds that explicitly capture the impact of R, alongside the number of clients K and per-client data size n, and applies these results to Federated SVM (FSVM) showing that higher communication frequency can degrade generalization. The FSVM analysis is complemented by experiments with MNIST and CIFAR-10 (ResNet-56) demonstrating that population risk may degrade more slowly than empirical risk and can even rise past a critical R*. The bounds also imply that FSVM generalization improves relative to centralized learning by a factor of order sqrt(log(K)/K). Overall, the paper provides a theoretical framework for understanding the trade-offs in FL between communication, data distribution, and generalization, with practical guidance on choosing the number of rounds R and potential regularization strategies.

Abstract

We investigate the generalization error of statistical learning models in a Federated Learning (FL) setting. Specifically, we study the evolution of the generalization error with the number of communication rounds $R$ between $K$ clients and a parameter server (PS), i.e., the effect on the generalization error of how often the clients' local models are aggregated at PS. In our setup, the more the clients communicate with PS the less data they use for local training in each round, such that the amount of training data per client is identical for distinct values of $R$. We establish PAC-Bayes and rate-distortion theoretic bounds on the generalization error that account explicitly for the effect of the number of rounds $R$, in addition to the number of participating devices $K$ and individual datasets size $n$. The bounds, which apply to a large class of loss functions and learning algorithms, appear to be the first of their kind for the FL setting. Furthermore, we apply our bounds to FL-type Support Vector Machines (FSVM); and derive (more) explicit bounds in this case. In particular, we show that the generalization bound of FSVM increases with $R$, suggesting that more frequent communication with PS diminishes the generalization power. This implies that the population risk decreases less fast with $R$ than does the empirical risk. Moreover, our bound suggests that the generalization error of FSVM decreases faster than that of centralized learning by a factor of $\mathcal{O}(\sqrt{\log(K)/K})$. Finally, we provide experimental results obtained using neural networks (ResNet-56) which show evidence that not only may our observations for FSVM hold more generally but also that the population risk may even start to increase beyond some value of $R$.

Lessons from Generalization Error Analysis of Federated Learning: You May Communicate Less Often!

TL;DR

Abstract

between

clients and a parameter server (PS), i.e., the effect on the generalization error of how often the clients' local models are aggregated at PS. In our setup, the more the clients communicate with PS the less data they use for local training in each round, such that the amount of training data per client is identical for distinct values of

. We establish PAC-Bayes and rate-distortion theoretic bounds on the generalization error that account explicitly for the effect of the number of rounds

, in addition to the number of participating devices

and individual datasets size

. The bounds, which apply to a large class of loss functions and learning algorithms, appear to be the first of their kind for the FL setting. Furthermore, we apply our bounds to FL-type Support Vector Machines (FSVM); and derive (more) explicit bounds in this case. In particular, we show that the generalization bound of FSVM increases with

, suggesting that more frequent communication with PS diminishes the generalization power. This implies that the population risk decreases less fast with

than does the empirical risk. Moreover, our bound suggests that the generalization error of FSVM decreases faster than that of centralized learning by a factor of

. Finally, we provide experimental results obtained using neural networks (ResNet-56) which show evidence that not only may our observations for FSVM hold more generally but also that the population risk may even start to increase beyond some value of

Paper Structure (51 sections, 13 theorems, 156 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 51 sections, 13 theorems, 156 equations, 9 figures, 1 table, 1 algorithm.

Introduction
Formal problem setup
Generalization error.
Example (FL-SGD).
Generalization bounds for Federated Learning algorithms
PAC-Bayes bounds
(i)
(ii)
Rate-distortion theoretic bounds
Federated Support Vector Machine (FSVM)
Experiments
On population risk of Federated Learning algorithms
Federated learning algorithm
Extension of the considered Federated Learning setup
Generalization error.
...and 36 more sections

Key Result

Theorem 3.1

Assume that the loss $\ell(Z_k,w)$ is $\sigma$-subgaussian for every $w\in \mathcal{W}$ and any $k\in [K]$. Also, let for every $k \in [K]$ and $r \in [R]$, $\mathsf{P}_{k,r}$ denote a conditional prior on $W^{(r)}_k$ given $\mkern 1.5mu\overline{\mkern-1.5muW\mkern-1.5mu}\mkern 1.5mu^{(r-1)}$. Then

Figures (9)

Figure 1: Multi-round Federated Learning
Figure 2: Generalization error and bound of Theorem \ref{['th:svm']} and empirical and population risks of FSVM and bound of Theorem \ref{['th:svm']} as functions of $R$, for $n=100$
Figure 3: Evolution of the performance of FL-SGD with ResNet-56 on CIFAR-10 as a function of $R$.
Figure 4: Generalization error of FSVM and bound of Theorem \ref{['th:svm']} w.r.t. $R$, for $K = 10$
Figure 5: (a) Empirical risk and (b) population risk of FSVM w.r.t. $R$, for $K = 10$
...and 4 more figures

Theorems & Definitions (25)

Theorem 3.1
Theorem 3.2
Theorem 3.3
Theorem 4.1
Theorem 2.1
Theorem 4.1
Proposition 4.2
Proposition 4.3
Proposition 4.4
proof
...and 15 more

Lessons from Generalization Error Analysis of Federated Learning: You May Communicate Less Often!

TL;DR

Abstract

Lessons from Generalization Error Analysis of Federated Learning: You May Communicate Less Often!

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (25)