Table of Contents
Fetching ...

Federated Learning with Bilateral Curation for Partially Class-Disjoint Data

Ziqing Fan, Ruipeng Zhang, Jiangchao Yao, Bo Han, Ya Zhang, Yanfeng Wang

TL;DR

Partially class-disjoint data (PCDD) in federated learning creates a mismatch between global and local objectives, leading to angle collapse for locally missing classes and space waste for locally existing ones. FedGELA addresses this by globally fixing the classifier as a simplex equiangular tight frame (ETF) and locally adapting to personal distributions with a distribution matrix, enabling balanced, bilateral discrimination. The authors provide convergence guarantees for both global and local tasks and demonstrate consistent performance gains across SVHN, CIFAR10/100, and real-world datasets like Fed-ISIC2019, FEMNIST, and SHAKESPEARE, across varied client scales and straggler settings. This bilateral curation offers a scalable, robust solution for FL under PCDD, with minimal additional communication and computation overhead.

Abstract

Partially class-disjoint data (PCDD), a common yet under-explored data formation where each client contributes a part of classes (instead of all classes) of samples, severely challenges the performance of federated algorithms. Without full classes, the local objective will contradict the global objective, yielding the angle collapse problem for locally missing classes and the space waste problem for locally existing classes. As far as we know, none of the existing methods can intrinsically mitigate PCDD challenges to achieve holistic improvement in the bilateral views (both global view and local view) of federated learning. To address this dilemma, we are inspired by the strong generalization of simplex Equiangular Tight Frame~(ETF) on the imbalanced data, and propose a novel approach called FedGELA where the classifier is globally fixed as a simplex ETF while locally adapted to the personal distributions. Globally, FedGELA provides fair and equal discrimination for all classes and avoids inaccurate updates of the classifier, while locally it utilizes the space of locally missing classes for locally existing classes. We conduct extensive experiments on a range of datasets to demonstrate that our FedGELA achieves promising performance~(averaged improvement of 3.9% to FedAvg and 1.5% to best baselines) and provide both local and global convergence guarantees. Source code is available at:https://github.com/MediaBrain-SJTU/FedGELA.git.

Federated Learning with Bilateral Curation for Partially Class-Disjoint Data

TL;DR

Partially class-disjoint data (PCDD) in federated learning creates a mismatch between global and local objectives, leading to angle collapse for locally missing classes and space waste for locally existing ones. FedGELA addresses this by globally fixing the classifier as a simplex equiangular tight frame (ETF) and locally adapting to personal distributions with a distribution matrix, enabling balanced, bilateral discrimination. The authors provide convergence guarantees for both global and local tasks and demonstrate consistent performance gains across SVHN, CIFAR10/100, and real-world datasets like Fed-ISIC2019, FEMNIST, and SHAKESPEARE, across varied client scales and straggler settings. This bilateral curation offers a scalable, robust solution for FL under PCDD, with minimal additional communication and computation overhead.

Abstract

Partially class-disjoint data (PCDD), a common yet under-explored data formation where each client contributes a part of classes (instead of all classes) of samples, severely challenges the performance of federated algorithms. Without full classes, the local objective will contradict the global objective, yielding the angle collapse problem for locally missing classes and the space waste problem for locally existing classes. As far as we know, none of the existing methods can intrinsically mitigate PCDD challenges to achieve holistic improvement in the bilateral views (both global view and local view) of federated learning. To address this dilemma, we are inspired by the strong generalization of simplex Equiangular Tight Frame~(ETF) on the imbalanced data, and propose a novel approach called FedGELA where the classifier is globally fixed as a simplex ETF while locally adapted to the personal distributions. Globally, FedGELA provides fair and equal discrimination for all classes and avoids inaccurate updates of the classifier, while locally it utilizes the space of locally missing classes for locally existing classes. We conduct extensive experiments on a range of datasets to demonstrate that our FedGELA achieves promising performance~(averaged improvement of 3.9% to FedAvg and 1.5% to best baselines) and provide both local and global convergence guarantees. Source code is available at:https://github.com/MediaBrain-SJTU/FedGELA.git.
Paper Structure (40 sections, 11 theorems, 43 equations, 11 figures, 11 tables, 1 algorithm)

This paper contains 40 sections, 11 theorems, 43 equations, 11 figures, 11 tables, 1 algorithm.

Key Result

Theorem 1

If $F_1,...,F_N$ are all L-smooth, $\mu$-strongly convex, and the variance and norm of $\nabla F_1,...,\nabla F_N$ are bounded by $\sigma$ and $G$. Choose $\kappa=L / \mu$ and $\gamma=\max\{8\kappa, E\}$, for all classes $c$ and sample $i$, expected global representation by cross-entropy loss will c where in FedGELA, $B = \sum_{k=1}^N (p_k^2 \sigma^2 + p_k ||\mathbf{\Phi}_k\mathbf{W}^L - \mathbf{W

Figures (11)

  • Figure 1: Illustration of feature spaces and classifier vectors trained on the global dataset, two partially class-disjoint datasets (A and B), and restricted by federated algorithms. (a) is trained on the globally balanced dataset with full classes. (b) and (c) are trained on datasets A and B, respectively, which suffer from different patterns of classifier angle collapse problems. (d) is averaged in the server or constrained by some federated algorithms.
  • Figure 2: Averaged angles of classifier vectors between locally existing classes (existing angle) and between locally missing classes (missing angle) on CIFAR10 (Dir ($\beta=0.1$)) in local client and aggregated in global server (local epoch is 10). In global, "existing" angle and "missing" angle converge to similar values while in the local, "existing" angle expands but "missing" angle shrinks.
  • Figure 3: Illustration of local and global convergence verification together with the effect of $\mathbf{\Phi}$. (a) and (b) are the results of averaged angle between all class means and between locally existing class means in FedAvg, FedGE, and FedGELA on CIFAR10 under 50 clients and Dir ($\beta=0.2$). (c) is the illustration of how local adaptation utilizes the wasted space of missing classes for existing classes.
  • Figure 4: Bilateral performance on four datasets by tuning $logE_W$ (x axis) of FedGELA.
  • Figure 5: Illustration of the averaged angle between locally existing classes and missing classes on the local client and global server of FedAvg, FedGE, and our FedGELA on CIFAR10.
  • ...and 6 more figures

Theorems & Definitions (17)

  • Theorem 1: Global Convergence
  • Theorem 2: Local Convergence
  • Lemma 1: ETF
  • Lemma 2: Fixing classifier as ETF
  • Lemma 3: Results of one step SGD fedproto
  • Lemma 4: Results of one step SGD fedskipthem_noniid
  • Lemma 5: Math tool from Stich stich
  • Lemma 6: Bounding the variance fedskipthem_noniid
  • Lemma 7
  • proof
  • ...and 7 more