Table of Contents
Fetching ...

Towards Understanding Generalization and Stability Gaps between Centralized and Decentralized Federated Learning

Yan Sun, Li Shen, Dacheng Tao

TL;DR

The paper analyzes generalization and stability gaps between Centralized Federated Learning (CFL) and Decentralized Federated Learning (DFL) on smooth non-convex objectives. It develops a uniform stability framework that does not assume bounded full gradients and derives explicit excess-risk bounds for FedAvg (CFL) and D-FedAvg (DFL), showing CFL generalizes no worse than DFL and that partial participation can be optimal for CFL. It further identifies a minimal topology requirement for DFL to avoid performance collapse as the client count grows and characterizes how topology and participation shape generalization. Extensive experiments on CIFAR-10 with Dirichlet-heterogeneous data validate the theory, demonstrating CFL’s superior test accuracy at larger scales and clarifying scenarios where DFL may be preferable due to communication constraints.

Abstract

As two mainstream frameworks in federated learning (FL), both centralized and decentralized approaches have shown great application value in practical scenarios. However, existing studies do not provide sufficient evidence and clear guidance for analysis of which performs better in the FL community. Although decentralized methods have been proven to approach the comparable convergence of centralized with less communication, their test performance always falls short of expectations in empirical studies. To comprehensively and fairly compare their efficiency gaps in FL, in this paper, we explore their stability and generalization efficiency. Specifically, we prove that on the general smooth non-convex objectives, 1) centralized FL (CFL) always generalizes better than decentralized FL (DFL); 2) CFL achieves the best performance via adopting partial participation instead of full participation; and, 3) there is a necessary requirement for the topology in DFL to avoid performance collapse as the training scale increases. We also conduct extensive experiments on several common setups in FL to validate that our theoretical analysis is consistent with experimental phenomena and contextually valid in several general and practical scenarios.

Towards Understanding Generalization and Stability Gaps between Centralized and Decentralized Federated Learning

TL;DR

The paper analyzes generalization and stability gaps between Centralized Federated Learning (CFL) and Decentralized Federated Learning (DFL) on smooth non-convex objectives. It develops a uniform stability framework that does not assume bounded full gradients and derives explicit excess-risk bounds for FedAvg (CFL) and D-FedAvg (DFL), showing CFL generalizes no worse than DFL and that partial participation can be optimal for CFL. It further identifies a minimal topology requirement for DFL to avoid performance collapse as the client count grows and characterizes how topology and participation shape generalization. Extensive experiments on CIFAR-10 with Dirichlet-heterogeneous data validate the theory, demonstrating CFL’s superior test accuracy at larger scales and clarifying scenarios where DFL may be preferable due to communication constraints.

Abstract

As two mainstream frameworks in federated learning (FL), both centralized and decentralized approaches have shown great application value in practical scenarios. However, existing studies do not provide sufficient evidence and clear guidance for analysis of which performs better in the FL community. Although decentralized methods have been proven to approach the comparable convergence of centralized with less communication, their test performance always falls short of expectations in empirical studies. To comprehensively and fairly compare their efficiency gaps in FL, in this paper, we explore their stability and generalization efficiency. Specifically, we prove that on the general smooth non-convex objectives, 1) centralized FL (CFL) always generalizes better than decentralized FL (DFL); 2) CFL achieves the best performance via adopting partial participation instead of full participation; and, 3) there is a necessary requirement for the topology in DFL to avoid performance collapse as the training scale increases. We also conduct extensive experiments on several common setups in FL to validate that our theoretical analysis is consistent with experimental phenomena and contextually valid in several general and practical scenarios.
Paper Structure (30 sections, 50 equations, 8 figures, 7 tables, 2 algorithms)

This paper contains 30 sections, 50 equations, 8 figures, 7 tables, 2 algorithms.

Figures (8)

  • Figure 1: Some classical topologies in the decentralized approaches.
  • Figure 2: We test different active ratios in CFL on the CIFAR-10 dataset with the ResNet-18 model. $m$ is the number of clients and $E$ is the number of local epochs. Each setup is repeated 5 times.
  • Figure 3: We test different batchsizes in CFL on the Dirichlet-0.1 split of the CIFAR-10 dataset with the ResNet-18 models. $m$ is the number of clients and $E$ is the number of local epochs.
  • Figure 4: We test different batchsizes in DFL on the Dirichlet-0.1 split of the CIFAR-10 dataset with the ResNet-18 models. $m$ is the number of clients and $E$ is the number of local epochs.
  • Figure 5: Loss curves of different active ratios in CFL.
  • ...and 3 more figures