Table of Contents
Fetching ...

GAS: Generative Activation-Aided Asynchronous Split Federated Learning

Jiarong Yang, Yuan Liu

TL;DR

GAS addresses the latency and bias challenges of asynchronous Split Federated Learning by introducing activation and model buffers that decouple update timing from communication delays. A key innovation is generative activations: the server maintains label-wise activation distributions and samples from them to generate activations that balance updates, reducing gradient dissimilarity and encouraging more reliable server-side updates. Theoretical analysis yields tighter convergence bounds for both server- and client-side models, with a decaying learning rate helping to mitigate straggler effects over time. Empirical results on CIFAR-10, CINIC-10, and Fashion-MNIST under heterogeneous data demonstrate that GAS outperforms both asynchronous FL baselines and synchronous SFL methods in accuracy and convergence speed, highlighting its practical potential for real-world, heterogeneous networks.

Abstract

Split Federated Learning (SFL) splits and collaboratively trains a shared model between clients and server, where clients transmit activations and client-side models to server for updates. Recent SFL studies assume synchronous transmission of activations and client-side models from clients to server. However, due to significant variations in computational and communication capabilities among clients, activations and client-side models arrive at server asynchronously. The delay caused by asynchrony significantly degrades the performance of SFL. To address this issue, we consider an asynchronous SFL framework, where an activation buffer and a model buffer are embedded on the server to manage the asynchronously transmitted activations and client-side models, respectively. Furthermore, as asynchronous activation transmissions cause the buffer to frequently receive activations from resource-rich clients, leading to biased updates of the server-side model, we propose Generative activations-aided Asynchronous SFL (GAS). In GAS, the server maintains an activation distribution for each label based on received activations and generates activations from these distributions according to the degree of bias. These generative activations are then used to assist in updating the server-side model, ensuring more accurate updates. We derive a tighter convergence bound, and our experiments demonstrate the effectiveness of the proposed method. The code is available at https://github.com/eejiarong/GAS.

GAS: Generative Activation-Aided Asynchronous Split Federated Learning

TL;DR

GAS addresses the latency and bias challenges of asynchronous Split Federated Learning by introducing activation and model buffers that decouple update timing from communication delays. A key innovation is generative activations: the server maintains label-wise activation distributions and samples from them to generate activations that balance updates, reducing gradient dissimilarity and encouraging more reliable server-side updates. Theoretical analysis yields tighter convergence bounds for both server- and client-side models, with a decaying learning rate helping to mitigate straggler effects over time. Empirical results on CIFAR-10, CINIC-10, and Fashion-MNIST under heterogeneous data demonstrate that GAS outperforms both asynchronous FL baselines and synchronous SFL methods in accuracy and convergence speed, highlighting its practical potential for real-world, heterogeneous networks.

Abstract

Split Federated Learning (SFL) splits and collaboratively trains a shared model between clients and server, where clients transmit activations and client-side models to server for updates. Recent SFL studies assume synchronous transmission of activations and client-side models from clients to server. However, due to significant variations in computational and communication capabilities among clients, activations and client-side models arrive at server asynchronously. The delay caused by asynchrony significantly degrades the performance of SFL. To address this issue, we consider an asynchronous SFL framework, where an activation buffer and a model buffer are embedded on the server to manage the asynchronously transmitted activations and client-side models, respectively. Furthermore, as asynchronous activation transmissions cause the buffer to frequently receive activations from resource-rich clients, leading to biased updates of the server-side model, we propose Generative activations-aided Asynchronous SFL (GAS). In GAS, the server maintains an activation distribution for each label based on received activations and generates activations from these distributions according to the degree of bias. These generative activations are then used to assist in updating the server-side model, ensuring more accurate updates. We derive a tighter convergence bound, and our experiments demonstrate the effectiveness of the proposed method. The code is available at https://github.com/eejiarong/GAS.
Paper Structure (29 sections, 2 theorems, 35 equations, 4 figures, 5 tables, 2 algorithms)

This paper contains 29 sections, 2 theorems, 35 equations, 4 figures, 5 tables, 2 algorithms.

Key Result

Lemma 1

By introducing generative activations, the server-side model update achieves a tighter bounded dissimilarity, as shown below: The Proof can be founded in Technical Appendix C.

Figures (4)

  • Figure 1: The framework of GAS. The client-side model is updated through four steps: ① Clients perform forward propagation; ④ The server receives the activations and computes backpropagated gradients; ⑤ Clients receive the gradients to update the client-side models, and complete a local iteration. After finishing local iterations, clients send the updated client-side models to the server; ⑥ The server stores these models in the model buffer and, when full, aggregates them to complete a global iteration. The server-side model is updated through two steps: ② Received activations update the distributions of activations. When the activation buffer is full, the server generate activations from these distributions; ③ Activations are stored in the buffer and, when full, the server concatenates them with generative activations to update the server-side model.
  • Figure 2: Impact of generative activations on gradient dissimilarity and convergence performance.
  • Figure 3: Test accuracy of GAS compared with the baseline methods on CIFAR-10 and CINIC-10.
  • Figure 4: Impact of local iterations on the performance of GAS compared to baseline methods.

Theorems & Definitions (2)

  • Lemma 1
  • Theorem 1