Table of Contents
Fetching ...

Data Heterogeneity and Forgotten Labels in Split Federated Learning

Joana Tirana, Dimitra Tsigkari, David Solans Noguero, Nicolas Kourtellis

TL;DR

Split Federated Learning under data heterogeneity exhibits catastrophic forgetting due to part-1 drift and server-side intra-round forgetting in part-2, which is exacerbated by the server's processing order and cut-layer position. The authors introduce Hydra, a multi-head–inspired mitigation that partitions part-2 into a shared base (part-2a) and multiple heads (part-2b) grouped by label distributions, aggregating into a single final head. Empirical results across MobileNet/ResNet, CIFAR/SVHN/TinyImageNet, and diverse non-IID partitions show Hydra reduces the label-performance gap and backward transfer while boosting global accuracy, often with modest memory/compute overhead. The work highlights the practical potential of structured, grouped higher-layer processing in SFL and opens avenues for theory, client selection, and label-semantic grouping in future research.

Abstract

In Split Federated Learning (SFL), the clients collaboratively train a model with the help of a server by splitting the model into two parts. Part-1 is trained locally at each client and aggregated by the aggregator at the end of each round. Part-2 is trained at a server that sequentially processes the intermediate activations received from each client. We study the phenomenon of catastrophic forgetting (CF) in SFL in the presence of data heterogeneity. In detail, due to the nature of SFL, local updates of part-1 may drift away from global optima, while part-2 is sensitive to the processing sequence, similar to forgetting in continual learning (CL). Specifically, we observe that the trained model performs better in classes (labels) seen at the end of the sequence. We investigate this phenomenon with emphasis on key aspects of SFL, such as the processing order at the server and the cut layer. Based on our findings, we propose Hydra, a novel mitigation method inspired by multi-head neural networks and adapted for the SFL's setting. Extensive numerical evaluations show that Hydra outperforms baselines and methods from the literature.

Data Heterogeneity and Forgotten Labels in Split Federated Learning

TL;DR

Split Federated Learning under data heterogeneity exhibits catastrophic forgetting due to part-1 drift and server-side intra-round forgetting in part-2, which is exacerbated by the server's processing order and cut-layer position. The authors introduce Hydra, a multi-head–inspired mitigation that partitions part-2 into a shared base (part-2a) and multiple heads (part-2b) grouped by label distributions, aggregating into a single final head. Empirical results across MobileNet/ResNet, CIFAR/SVHN/TinyImageNet, and diverse non-IID partitions show Hydra reduces the label-performance gap and backward transfer while boosting global accuracy, often with modest memory/compute overhead. The work highlights the practical potential of structured, grouped higher-layer processing in SFL and opens avenues for theory, client selection, and label-semantic grouping in future research.

Abstract

In Split Federated Learning (SFL), the clients collaboratively train a model with the help of a server by splitting the model into two parts. Part-1 is trained locally at each client and aggregated by the aggregator at the end of each round. Part-2 is trained at a server that sequentially processes the intermediate activations received from each client. We study the phenomenon of catastrophic forgetting (CF) in SFL in the presence of data heterogeneity. In detail, due to the nature of SFL, local updates of part-1 may drift away from global optima, while part-2 is sensitive to the processing sequence, similar to forgetting in continual learning (CL). Specifically, we observe that the trained model performs better in classes (labels) seen at the end of the sequence. We investigate this phenomenon with emphasis on key aspects of SFL, such as the processing order at the server and the cut layer. Based on our findings, we propose Hydra, a novel mitigation method inspired by multi-head neural networks and adapted for the SFL's setting. Extensive numerical evaluations show that Hydra outperforms baselines and methods from the literature.

Paper Structure

This paper contains 38 sections, 5 equations, 19 figures, 9 tables.

Figures (19)

  • Figure 1: (Top) The steps of the processing workflow of SFL. (Bottom) Parameter spaces for low error for clients’ dominant labels (under data heterogeneity), where a client's color represents its dominant label. Part-1 of the model suffers from catastrophic forgetting from round to round. Part-2 suffers from intra-round catastrophic forgetting due to the processing order at the server.
  • Figure 2: Accuracy per processing position (at the server) and global accuracy achieved by MobileNet in CIFAR-10 under non-IID data distributions among $10$ clients.
  • Figure 3: Global accuracy (y-axis) and PG (x-axis) for SFL with different cuts ($4$, $15$, and $23$, resp.) and FL/SplitFedV1.
  • Figure 4: The workflow of Hydra, the proposed method to mitigate catastrophic forgetting in SFL.
  • Figure 5: Performance gap and global accuracy (median) in SFL with and without Hydra with $80\%$-DL. Over-the-bar text shows the percentage of decrease/increase of the PG and global accuracy, respectively.
  • ...and 14 more figures