Harnessing Increased Client Participation with Cohort-Parallel Federated Learning
Akash Dhasade, Anne-Marie Kermarrec, Tuan-Anh Nguyen, Rafael Pires, Martijn de Vos
TL;DR
Cohort-Parallel Federated Learning (CPFL) tackles diminishing returns in federated learning by partitioning the client network into multiple cohorts that train in parallel, then unifying their models through knowledge distillation on a public unlabeled dataset. This architecture yields substantial reductions in training time and CPU resource usage, with modest accuracy loss, especially under non-IID data distributions; four cohorts can achieve around 1.9x faster convergence and 1.3x lower resource use on CIFAR-10 non-IID tasks. The authors provide a domain-adaptation-based theoretical bound for the distilled global model and support their claims with extensive experiments on CIFAR-10 and FEMNIST using realistic traces, highlighting the trade-offs between the number of cohorts, accuracy, and compute. CPFL thus offers a tunable, scalable approach to practical FL deployment, enabling practitioners to tailor resource usage and convergence timelines while preserving performance.
Abstract
Federated learning (FL) is a machine learning approach where nodes collaboratively train a global model. As more nodes participate in a round of FL, the effectiveness of individual model updates by nodes also diminishes. In this study, we increase the effectiveness of client updates by dividing the network into smaller partitions, or cohorts. We introduce Cohort-Parallel Federated Learning (CPFL): a novel learning approach where each cohort independently trains a global model using FL, until convergence, and the produced models by each cohort are then unified using knowledge distillation. The insight behind CPFL is that smaller, isolated networks converge quicker than in a one-network setting where all nodes participate. Through exhaustive experiments involving realistic traces and non-IID data distributions on the CIFAR-10 and FEMNIST image classification tasks, we investigate the balance between the number of cohorts, model accuracy, training time, and compute resources. Compared to traditional FL, CPFL with four cohorts, non-IID data distribution, and CIFAR-10 yields a 1.9x reduction in train time and a 1.3x reduction in resource usage, with a minimal drop in test accuracy.
