SCALA: Split Federated Learning with Concatenated Activations and Logit Adjustments

Jiarong Yang; Yuan Liu

SCALA: Split Federated Learning with Concatenated Activations and Logit Adjustments

Jiarong Yang, Yuan Liu

TL;DR

The paper tackles label distribution skew in Split Federated Learning caused by data heterogeneity and partial participation. It proposes SCALA, which centralizes server-side training on concatenated client activations and applies logit-adjusted losses to balance learning across skewed labels. Theoretical convergence guarantees and extensive experiments demonstrate robust improvements over baselines across multiple datasets and participation scenarios, including a privacy-enhanced variant. These findings highlight SCALA’s practical potential for scalable, skew-resilient distributed learning with explicit mechanisms to address both local and global label distribution challenges.

Abstract

Split Federated Learning (SFL) is a distributed machine learning framework which strategically divides the learning process between a server and clients and collaboratively trains a shared model by aggregating local models updated based on data from distributed clients. However, data heterogeneity and partial client participation result in label distribution skew, which severely degrades the learning performance. To address this issue, we propose SFL with Concatenated Activations and Logit Adjustments (SCALA). Specifically, the activations from the client-side models are concatenated as the input of the server-side model so as to centrally adjust label distribution across different clients, and logit adjustments of loss functions on both server-side and client-side models are performed to deal with the label distribution variation across different subsets of participating clients. Theoretical analysis and experimental results verify the superiority of the proposed SCALA on public datasets.

SCALA: Split Federated Learning with Concatenated Activations and Logit Adjustments

TL;DR

Abstract

Paper Structure (26 sections, 5 theorems, 49 equations, 7 figures, 6 tables, 1 algorithm)

This paper contains 26 sections, 5 theorems, 49 equations, 7 figures, 6 tables, 1 algorithm.

Introduction
Related Works
Federated Learning
Split Learning
Split Federated Learning
Summary and Motivation for SCALA
Proposed Method: SCALA
Motivations Behind the Two Proposed Modules in SCALA
Concatenated Activations Enabled SFL Framework
Preliminaries
Algorithm Description
Logit Adjustments in Loss Functions
Theoretical Analysis for SCALA
Convergence Analysis of Concatenated Activations Enabled SFL
Analysis on Update Process of the Classifier of SCALA
...and 11 more sections

Key Result

Theorem 1

Under Assumptions assumption1-assumption4, denote $F^*=\min_{\mathbf{w}}F(\mathbf{w})$, $\sigma_{\max}^2=\max_{n\in[N]}\{\sigma_n^2\}$ and $\kappa_{\max}^2=\max_{n\in[N]}\{\kappa_n^2\}$, let $\rho$ be the client participation ratio, $T$ be the total global iterations and $I$ be the number of local i

Figures (7)

Figure 1: An illustration of traditional SFL and SCALA in scenarios with skewed label distribution. Traditional SFL maintains and trains server-side models for each participating client on the server and periodically aggregates these server-side models. SCALA maintains and trains one server-side model based on concatenated activations.
Figure 2: Concatenated activations enabled SFL framework. All participating clients synchronously execute local iterations, where the client-side models are updated locally and sent to the server for aggregation at the $I$-th local iteration. The server-side model is centrally updated in each local iteration, where the input is concatenated from the activations uploaded by participating clients.
Figure 3: The process of concatenating activations. The activations uploaded by participating clients are concatenated to serve as the input for the server-side model, effectively mitigating the issue of missing classes under a highly skewed local label distribution.
Figure 4: Test accuracy of SCALA compared with alternative SFL configurations.
Figure 5: Per-class test accuracy of SCALA compared with alternative SFL configurations.
...and 2 more figures

Theorems & Definitions (10)

Theorem 1
proof
Theorem 2
proof
Lemma 1
proof
Lemma 2
proof
Lemma 3
proof

SCALA: Split Federated Learning with Concatenated Activations and Logit Adjustments

TL;DR

Abstract

SCALA: Split Federated Learning with Concatenated Activations and Logit Adjustments

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (10)