Table of Contents
Fetching ...

HO-SFL: Hybrid-Order Split Federated Learning with Backprop-Free Clients and Dimension-Free Aggregation

Qiyuan Chen, Xian Wu, Yi Wang, Xianhao Chen

Abstract

Fine-tuning large models on edge devices is severely hindered by the memory-intensive backpropagation (BP) in standard frameworks like federated learning and split learning. While substituting BP with zeroth-order optimization can significantly reduce memory footprints, it typically suffers from prohibitively degraded convergence speed. To resolve this dilemma, we propose Hybrid-Order Split Federated Learning (HO-SFL). By reformulating the split learning process within a Lagrangian framework, HO-SFL decouples the optimization landscape: The server performs precise first-order updates (i.e., BP), whereas clients conduct memory-efficient zeroth-order optimization. This hybrid design not only eliminates the need for client-side BP but also enables dimension-free model aggregation, drastically lowering communication costs. Crucially, we provide a theoretical convergence analysis, demonstrating that HO-SFL mitigates the dimension-dependent convergence slowdown of zeroth-order optimization, achieving a convergence rate comparable to first-order methods. Extensive experiments on tasks across vision and language modalities validate that HO-SFL achieves convergence speeds comparable to first-order baselines while significantly reducing communication costs and client memory footprints.

HO-SFL: Hybrid-Order Split Federated Learning with Backprop-Free Clients and Dimension-Free Aggregation

Abstract

Fine-tuning large models on edge devices is severely hindered by the memory-intensive backpropagation (BP) in standard frameworks like federated learning and split learning. While substituting BP with zeroth-order optimization can significantly reduce memory footprints, it typically suffers from prohibitively degraded convergence speed. To resolve this dilemma, we propose Hybrid-Order Split Federated Learning (HO-SFL). By reformulating the split learning process within a Lagrangian framework, HO-SFL decouples the optimization landscape: The server performs precise first-order updates (i.e., BP), whereas clients conduct memory-efficient zeroth-order optimization. This hybrid design not only eliminates the need for client-side BP but also enables dimension-free model aggregation, drastically lowering communication costs. Crucially, we provide a theoretical convergence analysis, demonstrating that HO-SFL mitigates the dimension-dependent convergence slowdown of zeroth-order optimization, achieving a convergence rate comparable to first-order methods. Extensive experiments on tasks across vision and language modalities validate that HO-SFL achieves convergence speeds comparable to first-order baselines while significantly reducing communication costs and client memory footprints.
Paper Structure (48 sections, 9 theorems, 84 equations, 8 figures, 3 tables, 3 algorithms)

This paper contains 48 sections, 9 theorems, 84 equations, 8 figures, 3 tables, 3 algorithms.

Key Result

Proposition 4.1

Let $\Gamma$ be the regularity bound on the gradient magnitudes and Hessian spectral norms. The client-side zeroth-order estimator $\hat{\boldsymbol{g}}_{c}^t$ exhibits the following properties: 1. Bias Control (from Lemma lemma:global_bias): The estimator is biased due to the curvature of the loss 2. Second Moment Bound (from Lemma lemma:client_moment_bound): The second moment of the aggregated

Figures (8)

  • Figure 1: Overview of the HO-SFL training loop. Each selected client $m$ computes an activation $\boldsymbol{z}_m^t=f_c(\boldsymbol{x}_m;\boldsymbol{\theta}_c^t)$ and sends $(\boldsymbol{z}_m^t,y_m)$ to the server. The server performs BP to update $\boldsymbol{\theta}_s$ and returns the activation-gradient feedback $\boldsymbol{\lambda}_m^t=\nabla_{\boldsymbol{z}_m}\ell$. In parallel, the client runs $P$ ZO perturbation forward passes $\tilde{\boldsymbol{z}}_{m,p}^t=f_c(\boldsymbol{x}_m;\boldsymbol{\theta}_c^t+\mu\boldsymbol{u}_p^t)$ and computes scalar projections $v_{m,p}^t=\boldsymbol{\lambda}_m^{t\top}(\tilde{\boldsymbol{z}}_{m,p}^t-\boldsymbol{z}_m^t)$. The server then aggregates these scalars into $\bar{v}_p^t=\frac{1}{K}\sum_{m\in\mathcal{S}_t}v_{m,p}^t$, which are broadcast back for clients to construct $\hat{\boldsymbol{g}}_c^t=\frac{1}{P\mu}\sum_{p=1}^P\bar{v}_p^t\boldsymbol{u}_p^t$ and update $\boldsymbol{\theta}_c$.
  • Figure 2: Comparison of system efficiency. (a) Standard SFL incurs significant idle time on the client side waiting for gradients. (b) HO-SFL effectively masks the computational cost of multiple client-side zeroth-order perturbations by overlapping them with the server's backpropagation and communication processes.
  • Figure 3: Validation accuracy convergence on CIFAR-10 under IID (left) and Non-IID (right) settings. Solid lines denote the mean performance, and shaded regions represent the standard deviation across 10 independent random seeds.
  • Figure 4: Communication and memory profiling.
  • Figure 5: Feasibility analysis of latency hiding under constrained edge resources.
  • ...and 3 more figures

Theorems & Definitions (18)

  • Proposition 4.1: Bias and Variance Decomposition
  • Theorem 4.2: Convergence Bound of HO-SFL
  • Corollary 4.3: Convergence Rate of HO-SFL
  • proof
  • Remark 4.4
  • Definition 1.1: Regularity Bound $\Gamma$
  • Lemma 1.2: Conditional Bound on Local ZO Estimator
  • proof
  • Lemma 1.3: Server-side Second Moment Bound
  • proof
  • ...and 8 more