Harnessing Increased Client Participation with Cohort-Parallel Federated Learning

Akash Dhasade; Anne-Marie Kermarrec; Tuan-Anh Nguyen; Rafael Pires; Martijn de Vos

Harnessing Increased Client Participation with Cohort-Parallel Federated Learning

Akash Dhasade, Anne-Marie Kermarrec, Tuan-Anh Nguyen, Rafael Pires, Martijn de Vos

TL;DR

Cohort-Parallel Federated Learning (CPFL) tackles diminishing returns in federated learning by partitioning the client network into multiple cohorts that train in parallel, then unifying their models through knowledge distillation on a public unlabeled dataset. This architecture yields substantial reductions in training time and CPU resource usage, with modest accuracy loss, especially under non-IID data distributions; four cohorts can achieve around 1.9x faster convergence and 1.3x lower resource use on CIFAR-10 non-IID tasks. The authors provide a domain-adaptation-based theoretical bound for the distilled global model and support their claims with extensive experiments on CIFAR-10 and FEMNIST using realistic traces, highlighting the trade-offs between the number of cohorts, accuracy, and compute. CPFL thus offers a tunable, scalable approach to practical FL deployment, enabling practitioners to tailor resource usage and convergence timelines while preserving performance.

Abstract

Federated learning (FL) is a machine learning approach where nodes collaboratively train a global model. As more nodes participate in a round of FL, the effectiveness of individual model updates by nodes also diminishes. In this study, we increase the effectiveness of client updates by dividing the network into smaller partitions, or cohorts. We introduce Cohort-Parallel Federated Learning (CPFL): a novel learning approach where each cohort independently trains a global model using FL, until convergence, and the produced models by each cohort are then unified using knowledge distillation. The insight behind CPFL is that smaller, isolated networks converge quicker than in a one-network setting where all nodes participate. Through exhaustive experiments involving realistic traces and non-IID data distributions on the CIFAR-10 and FEMNIST image classification tasks, we investigate the balance between the number of cohorts, model accuracy, training time, and compute resources. Compared to traditional FL, CPFL with four cohorts, non-IID data distribution, and CIFAR-10 yields a 1.9x reduction in train time and a 1.3x reduction in resource usage, with a minimal drop in test accuracy.

Harnessing Increased Client Participation with Cohort-Parallel Federated Learning

TL;DR

Abstract

Paper Structure (22 sections, 4 theorems, 8 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 22 sections, 4 theorems, 8 equations, 8 figures, 1 table, 1 algorithm.

Introduction
Contributions.
Related Work
Parallelism in Federated Learning.
Knowledge Distillation (KD)
CPFL extremes.
Cohort-Parallel Federated Learning
Algorithm overview
Cross-domain analysis
Evaluation
Experiment setup
Time and resource savings by CPFL
Training time of cohorts
Cohort data samples and training time
Teacher and student accuracies of KD
...and 7 more sections

Key Result

Theorem 1

Let $\mathcal{H}$ be a finite hypothesis class and $h_s :=\sum_{i=1}^n p_i h_i$, where $p_i > 0$ and $\sum_{i=1}^n p_i = 1$. Suppose that each source dataset has $m$ instances. Then, for any $\delta \in (0,1)$, with probability at least $1-\delta$, the expected risk of $h_s$ on the target distributi where $\lambda_{i,k} := \inf_{h \in \mathcal{H}} \left\{\mathcal{L}_{\mathcal{D}_{i,k}} (h) + \math

Figures (8)

Figure 1: The architecture of Cohort-Parallel Federated Learning (CPFL).
Figure 2: Comparing validation loss for partitioned (solid curves) and unpartitioned (dashed curve) networks across IID (left figure) and non-IID (right figure) distributions. The vertical dotted line denotes the convergence point of the training. Additional details on the experiment setup are provided in \ref{['sec:setup_motivation_plot']}.
Figure 3: The test accuracy, convergence time and resource usage (in CPU hours) of CIFAR-10, for increasing number of cohorts ($n$) and different heterogeneity levels (controlled by $\alpha$). Results for $\alpha = 0.3$ are included in \ref{['sec:app_cifar10']}.
Figure 4: The test accuracy, convergence time and resource usage (in CPU hours) of FEMNIST, for increasing number of cohorts ($n$).
Figure 5: The finish times of individual cohorts, for different numbers of total cohorts and data distributions. We mark the finish time of each group with a symbol.
...and 3 more figures

Theorems & Definitions (4)

Theorem 1
Lemma 2: Theorem 3, domain-adaptation-theory
Lemma 3
Theorem 1

Harnessing Increased Client Participation with Cohort-Parallel Federated Learning

TL;DR

Abstract

Harnessing Increased Client Participation with Cohort-Parallel Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (4)