Table of Contents
Fetching ...

Non-Federated Multi-Task Split Learning for Heterogeneous Sources

Yilin Zheng, Atilla Eryilmaz

TL;DR

This work tackles heterogeneity in edge-network data by moving from Federated Learning to Multi-Task Split Learning (MTSL), where each task m maintains its own model F_m split between a server component G(φ, ·) and a client component H_m(ψ_m, ·). The framework avoids explicit gradient federation and communicates smashed data and partial gradients, enabling per-task LR tuning and potential gains in convergence speed and communication efficiency; convergence bounds are provided for convex and non-convex objectives under Lipschitz-gradient assumptions, with SGD results also discussed. Empirical results on MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 show that MTSL achieves higher multi-task accuracy and greater robustness to heterogeneity and noise, while reducing training steps and data transmission compared to FedAvg, FedEM, and SplitFed, particularly when data are highly non-i.i.d. The work positions MTSL as a practical alternative to FL in highly heterogeneous edge environments and outlines future directions for privacy-preserving enhancements and dynamic adaptation to data-source heterogeneity.

Abstract

With the development of edge networks and mobile computing, the need to serve heterogeneous data sources at the network edge requires the design of new distributed machine learning mechanisms. As a prevalent approach, Federated Learning (FL) employs parameter-sharing and gradient-averaging between clients and a server. Despite its many favorable qualities, such as convergence and data-privacy guarantees, it is well-known that classic FL fails to address the challenge of data heterogeneity and computation heterogeneity across clients. Most existing works that aim to accommodate such sources of heterogeneity stay within the FL operation paradigm, with modifications to overcome the negative effect of heterogeneous data. In this work, as an alternative paradigm, we propose a Multi-Task Split Learning (MTSL) framework, which combines the advantages of Split Learning (SL) with the flexibility of distributed network architectures. In contrast to the FL counterpart, in this paradigm, heterogeneity is not an obstacle to overcome, but a useful property to take advantage of. As such, this work aims to introduce a new architecture and methodology to perform multi-task learning for heterogeneous data sources efficiently, with the hope of encouraging the community to further explore the potential advantages we reveal. To support this promise, we first show through theoretical analysis that MTSL can achieve fast convergence by tuning the learning rate of the server and clients. Then, we compare the performance of MTSL with existing multi-task FL methods numerically on several image classification datasets to show that MTSL has advantages over FL in training speed, communication cost, and robustness to heterogeneous data.

Non-Federated Multi-Task Split Learning for Heterogeneous Sources

TL;DR

This work tackles heterogeneity in edge-network data by moving from Federated Learning to Multi-Task Split Learning (MTSL), where each task m maintains its own model F_m split between a server component G(φ, ·) and a client component H_m(ψ_m, ·). The framework avoids explicit gradient federation and communicates smashed data and partial gradients, enabling per-task LR tuning and potential gains in convergence speed and communication efficiency; convergence bounds are provided for convex and non-convex objectives under Lipschitz-gradient assumptions, with SGD results also discussed. Empirical results on MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 show that MTSL achieves higher multi-task accuracy and greater robustness to heterogeneity and noise, while reducing training steps and data transmission compared to FedAvg, FedEM, and SplitFed, particularly when data are highly non-i.i.d. The work positions MTSL as a practical alternative to FL in highly heterogeneous edge environments and outlines future directions for privacy-preserving enhancements and dynamic adaptation to data-source heterogeneity.

Abstract

With the development of edge networks and mobile computing, the need to serve heterogeneous data sources at the network edge requires the design of new distributed machine learning mechanisms. As a prevalent approach, Federated Learning (FL) employs parameter-sharing and gradient-averaging between clients and a server. Despite its many favorable qualities, such as convergence and data-privacy guarantees, it is well-known that classic FL fails to address the challenge of data heterogeneity and computation heterogeneity across clients. Most existing works that aim to accommodate such sources of heterogeneity stay within the FL operation paradigm, with modifications to overcome the negative effect of heterogeneous data. In this work, as an alternative paradigm, we propose a Multi-Task Split Learning (MTSL) framework, which combines the advantages of Split Learning (SL) with the flexibility of distributed network architectures. In contrast to the FL counterpart, in this paradigm, heterogeneity is not an obstacle to overcome, but a useful property to take advantage of. As such, this work aims to introduce a new architecture and methodology to perform multi-task learning for heterogeneous data sources efficiently, with the hope of encouraging the community to further explore the potential advantages we reveal. To support this promise, we first show through theoretical analysis that MTSL can achieve fast convergence by tuning the learning rate of the server and clients. Then, we compare the performance of MTSL with existing multi-task FL methods numerically on several image classification datasets to show that MTSL has advantages over FL in training speed, communication cost, and robustness to heterogeneous data.
Paper Structure (13 sections, 3 theorems, 32 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 13 sections, 3 theorems, 32 equations, 4 figures, 3 tables, 1 algorithm.

Key Result

Proposition 1

Under Assumptions assmp:l_gradient-assmp:b_gradient, using gradient descent as the optimization method with learning rate $\boldsymbol{\eta} = (\eta_s, \eta_1, \ldots, \eta_M)^{\intercal}$ satisfying $\eta_m \leq \frac{1}{L_m}, \forall m$ , the MTSL framework has the follwing convergence results:

Figures (4)

  • Figure 1: Multi-Task Split Learning Framework. Each client only uploads its smashed data $s_m$ and label $Y_m$ to the server. The server calculates the loss and does the backpropagation of the split network to each client.
  • Figure 2: Effect of learning rate tuning for linear model with quadratic loss. (a) Using separate networks for task 1 and 2. (b) MTSL setup with common LR $\eta_s=\eta_1=\eta_2=0.01$. (c) MTSL setup with $\eta_1=\eta_2=0.01$ and decreased server LR $\eta_s=0.002$. (d) MTSL setup with $\eta_s=0.002$, $\eta_2=0.01$ and increased LR for client 1: $\eta_1=0.02$. (e) MTSL setup with $\eta_s=0.002$, $\eta_1=0.01$ and increased LR for client 2: $\eta_1=0.02$.
  • Figure 3: Training cost of different algorithms for MNIST dataset as the testing accuracy increases ($\alpha=0$). (a) Number of training steps needed to reach certain accuracy. (b) Amount of data (smashed data, gradients, parameters) transmitted to reach certain accuracy.
  • Figure 4: Performance of different algorithms for MNIST dataset as the noise level changes. (a) Changing the data heterogeneity parameter $\alpha$. (b) Adding pixel-wise zero mean Gaussian noise with different standard deviation $\sigma$ ($\alpha=0)$.

Theorems & Definitions (3)

  • Proposition 1: Non-Stochastic Case
  • Proposition 2: Stochastic Case
  • Corollary 1