Table of Contents
Fetching ...

Neural Collapse-Inspired Multi-Label Federated Learning under Label-Distribution Skew

Can Peng, Yuyuan Liu, Yingyu Yang, Pramit Saha, Qianye Yang, J. Alison Noble

TL;DR

This work tackles multi-label federated learning under label skew by introducing FedNCA-ML, which enforces Neural Collapse–inspired geometry to align feature distributions across non-IID clients. It combines a Label-Aware Disentanglement Module that extracts per-class features with a fixed Equiangular Tight Frame classifier to enforce a global simplex ETF structure, along with two regularizers that suppress noisy negatives and promote compact intra-class clustering. The approach demonstrates strong class-wise improvements across four datasets in eight non-IID settings, achieving up to $3.92\%$ in macro-AUC and $4.93\%$ in macro-F1 on average, highlighting improved robustness and generalization in challenging multi-label FL scenarios. The framework offers a principled path to preserve global label geometry in heterogeneous FL, with practical relevance to medical imaging and other multi-label domains where label distribution skew is prevalent.

Abstract

Federated Learning (FL) enables collaborative model training across distributed clients while preserving data privacy, yet it remains challenging as data distributions can be highly heterogeneous. These challenges are further amplified in multi-label scenarios, where data exhibit characteristics such as label co-occurrence, inter-label dependency, and discrepancies between local and global label relationships. While most existing FL studies focus on single-label classification, real-world applications, such as in medical imaging, involve multi-label data with highly skewed label distributions across clients. To address this important yet underexplored problem, we propose FedNCA-ML, a novel FL framework that aligns feature distributions across clients and learns discriminative, well-clustered representations inspired by Neural Collapse (NC) theory. NC describes an ideal latent-space geometry where each class's features collapse to their mean, forming a maximally separated simplex. To extend this theory to multi-label settings, we introduce a feature disentanglement module that extracts class-specific representations. The clustering of these disentangled features is guided by a shared NC-inspired structure, mitigating conflicts among client models caused by heterogeneous local data. Furthermore, we design regularisation losses to encourage compact and consistent feature clustering in the latent space. Experiments on four benchmark datasets under eight FL settings demonstrate the effectiveness of the proposed method, achieving improvements of up to 3.92% in class-wise AUC and 4.93% in class-wise F1 score.

Neural Collapse-Inspired Multi-Label Federated Learning under Label-Distribution Skew

TL;DR

This work tackles multi-label federated learning under label skew by introducing FedNCA-ML, which enforces Neural Collapse–inspired geometry to align feature distributions across non-IID clients. It combines a Label-Aware Disentanglement Module that extracts per-class features with a fixed Equiangular Tight Frame classifier to enforce a global simplex ETF structure, along with two regularizers that suppress noisy negatives and promote compact intra-class clustering. The approach demonstrates strong class-wise improvements across four datasets in eight non-IID settings, achieving up to in macro-AUC and in macro-F1 on average, highlighting improved robustness and generalization in challenging multi-label FL scenarios. The framework offers a principled path to preserve global label geometry in heterogeneous FL, with practical relevance to medical imaging and other multi-label domains where label distribution skew is prevalent.

Abstract

Federated Learning (FL) enables collaborative model training across distributed clients while preserving data privacy, yet it remains challenging as data distributions can be highly heterogeneous. These challenges are further amplified in multi-label scenarios, where data exhibit characteristics such as label co-occurrence, inter-label dependency, and discrepancies between local and global label relationships. While most existing FL studies focus on single-label classification, real-world applications, such as in medical imaging, involve multi-label data with highly skewed label distributions across clients. To address this important yet underexplored problem, we propose FedNCA-ML, a novel FL framework that aligns feature distributions across clients and learns discriminative, well-clustered representations inspired by Neural Collapse (NC) theory. NC describes an ideal latent-space geometry where each class's features collapse to their mean, forming a maximally separated simplex. To extend this theory to multi-label settings, we introduce a feature disentanglement module that extracts class-specific representations. The clustering of these disentangled features is guided by a shared NC-inspired structure, mitigating conflicts among client models caused by heterogeneous local data. Furthermore, we design regularisation losses to encourage compact and consistent feature clustering in the latent space. Experiments on four benchmark datasets under eight FL settings demonstrate the effectiveness of the proposed method, achieving improvements of up to 3.92% in class-wise AUC and 4.93% in class-wise F1 score.

Paper Structure

This paper contains 26 sections, 14 equations, 12 figures, 6 tables, 1 algorithm.

Figures (12)

  • Figure 1: (a): Local client data are heterogeneous, exhibiting distinct class imbalances. (b): Multi-label data further exacerbate this heterogeneity through diverse label correlations. (c): FedNCA-ML addresses multi-label label-skewed FL challenges by disentangling class-specific features and promoting Neural Collapse–inspired structured clustering in the latent space.
  • Figure 2: Overview of the proposed FedNCA-ML framework for multi-label label-skewed FL. Subfigure (a) shows the overall architecture, while Subfigures (b)–(d) illustrate the Label-Aware Disentanglement Module (LADM) and the regularization losses. The attention-based LADM extracts label-specific features from image-level features. A predefined ETF matrix acts as both the shared classifier and the source of class-wise query embeddings, ensuring consistent local training across clients. Two regularisation terms are further incorporated to suppress noisy negative features and promote compact intra-class clustering in the latent feature space.
  • Figure 3: t-SNE visualisation of test data feature embeddings on the multi-label DermaMNIST experiment with $\beta = 0.1$, $\gamma = 0.71$. Each colour represents a class. Observing from subfigure (a), without feature disentanglement (LADM), the model relies on undesired information, such as the number of labels per sample, for clustering.
  • Figure 4: Pairwise cosine similarity of class-wise average feature prototypes. Incorporating LADM, ETF-based alignment, and structure-preserving regularization lowers inter-class similarity, reflecting enhanced separability and discrimination.
  • Figure 5: Examples of attention maps from the multi-label DermaMNIST dataset. Each subfigure shows a sample image alongside the class-specific attention maps generated by the LADM module for the corresponding ground-truth labels. The LADM module captures class-specific features through its attention mechanism, where redder regions indicate areas of higher attention.
  • ...and 7 more figures