Federated Learning with Bilateral Curation for Partially Class-Disjoint Data

Ziqing Fan; Ruipeng Zhang; Jiangchao Yao; Bo Han; Ya Zhang; Yanfeng Wang

Federated Learning with Bilateral Curation for Partially Class-Disjoint Data

Ziqing Fan, Ruipeng Zhang, Jiangchao Yao, Bo Han, Ya Zhang, Yanfeng Wang

TL;DR

Partially class-disjoint data (PCDD) in federated learning creates a mismatch between global and local objectives, leading to angle collapse for locally missing classes and space waste for locally existing ones. FedGELA addresses this by globally fixing the classifier as a simplex equiangular tight frame (ETF) and locally adapting to personal distributions with a distribution matrix, enabling balanced, bilateral discrimination. The authors provide convergence guarantees for both global and local tasks and demonstrate consistent performance gains across SVHN, CIFAR10/100, and real-world datasets like Fed-ISIC2019, FEMNIST, and SHAKESPEARE, across varied client scales and straggler settings. This bilateral curation offers a scalable, robust solution for FL under PCDD, with minimal additional communication and computation overhead.

Abstract

Partially class-disjoint data (PCDD), a common yet under-explored data formation where each client contributes a part of classes (instead of all classes) of samples, severely challenges the performance of federated algorithms. Without full classes, the local objective will contradict the global objective, yielding the angle collapse problem for locally missing classes and the space waste problem for locally existing classes. As far as we know, none of the existing methods can intrinsically mitigate PCDD challenges to achieve holistic improvement in the bilateral views (both global view and local view) of federated learning. To address this dilemma, we are inspired by the strong generalization of simplex Equiangular Tight Frame~(ETF) on the imbalanced data, and propose a novel approach called FedGELA where the classifier is globally fixed as a simplex ETF while locally adapted to the personal distributions. Globally, FedGELA provides fair and equal discrimination for all classes and avoids inaccurate updates of the classifier, while locally it utilizes the space of locally missing classes for locally existing classes. We conduct extensive experiments on a range of datasets to demonstrate that our FedGELA achieves promising performance~(averaged improvement of 3.9% to FedAvg and 1.5% to best baselines) and provide both local and global convergence guarantees. Source code is available at:https://github.com/MediaBrain-SJTU/FedGELA.git.

Federated Learning with Bilateral Curation for Partially Class-Disjoint Data

TL;DR

Abstract

Paper Structure (40 sections, 11 theorems, 43 equations, 11 figures, 11 tables, 1 algorithm)

This paper contains 40 sections, 11 theorems, 43 equations, 11 figures, 11 tables, 1 algorithm.

Introduction
Related Work
Partially Class-Disjoint Data and Federated Learning algorithms
Simplex Equiangular Tight Frame
Method
Preliminaries
ETF under LPM.
FedAvg.
Contradiction and Motivation
FedGELA
Theoretical Analysis
Notations
Convergence analysis
Experiments
Experimental Setup
...and 25 more sections

Key Result

Theorem 1

If $F_1,...,F_N$ are all L-smooth, $\mu$-strongly convex, and the variance and norm of $\nabla F_1,...,\nabla F_N$ are bounded by $\sigma$ and $G$. Choose $\kappa=L / \mu$ and $\gamma=\max\{8\kappa, E\}$, for all classes $c$ and sample $i$, expected global representation by cross-entropy loss will c where in FedGELA, $B = \sum_{k=1}^N (p_k^2 \sigma^2 + p_k ||\mathbf{\Phi}_k\mathbf{W}^L - \mathbf{W

Figures (11)

Figure 1: Illustration of feature spaces and classifier vectors trained on the global dataset, two partially class-disjoint datasets (A and B), and restricted by federated algorithms. (a) is trained on the globally balanced dataset with full classes. (b) and (c) are trained on datasets A and B, respectively, which suffer from different patterns of classifier angle collapse problems. (d) is averaged in the server or constrained by some federated algorithms.
Figure 2: Averaged angles of classifier vectors between locally existing classes (existing angle) and between locally missing classes (missing angle) on CIFAR10 (Dir ($\beta=0.1$)) in local client and aggregated in global server (local epoch is 10). In global, "existing" angle and "missing" angle converge to similar values while in the local, "existing" angle expands but "missing" angle shrinks.
Figure 3: Illustration of local and global convergence verification together with the effect of $\mathbf{\Phi}$. (a) and (b) are the results of averaged angle between all class means and between locally existing class means in FedAvg, FedGE, and FedGELA on CIFAR10 under 50 clients and Dir ($\beta=0.2$). (c) is the illustration of how local adaptation utilizes the wasted space of missing classes for existing classes.
Figure 4: Bilateral performance on four datasets by tuning $logE_W$ (x axis) of FedGELA.
Figure 5: Illustration of the averaged angle between locally existing classes and missing classes on the local client and global server of FedAvg, FedGE, and our FedGELA on CIFAR10.
...and 6 more figures

Theorems & Definitions (17)

Theorem 1: Global Convergence
Theorem 2: Local Convergence
Lemma 1: ETF
Lemma 2: Fixing classifier as ETF
Lemma 3: Results of one step SGD fedproto
Lemma 4: Results of one step SGD fedskipthem_noniid
Lemma 5: Math tool from Stich stich
Lemma 6: Bounding the variance fedskipthem_noniid
Lemma 7
proof
...and 7 more

Federated Learning with Bilateral Curation for Partially Class-Disjoint Data

TL;DR

Abstract

Federated Learning with Bilateral Curation for Partially Class-Disjoint Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (17)