Client-Cooperative Split Learning

Haiyu Deng; Yanna Jiang; Guangsheng Yu; Qin Wang; Xu Wang; Wei Ni; Shiping Chen; Ren Ping Liu

Client-Cooperative Split Learning

Haiyu Deng, Yanna Jiang, Guangsheng Yu, Qin Wang, Xu Wang, Wei Ni, Shiping Chen, Ren Ping Liu

TL;DR

CliCooper is presented, a multi-client cooperative SL framework tailored for cooperative model training services in heterogeneous and partially trusted environments, where one client contributes data, while others collectively act as SL trainers.

Abstract

Model training is increasingly offered as a service for resource-constrained data owners to build customized models. Split Learning (SL) enables such services by offloading training computation under privacy constraints, and evolves toward serverless and multi-client settings where model segments are distributed across training clients. This cooperative mode assumes partial trust: data owners hide labels and data from trainer clients, while trainer clients produce verifiable training artifacts and ownership proofs. We present CliCooper, a multi-client cooperative SL framework tailored for cooperative model training services in heterogeneous and partially trusted environments, where one client contributes data, while others collectively act as SL trainers. CliCooper bridges the privacy and trust gaps through two new designs. First, differential privacy-based activation protection and secret label obfuscation safeguard data owners' privacy without degrading model performance. Second, a dynamic chained watermarking scheme cryptographically links training stages on model segments across trainers, ensuring verifiable training integrity, robust model provenance, and copyright protection. Experiments show that CliCooper preserves model accuracy while enhancing resilience to privacy and ownership attacks. It reduces the success rate of clustering attacks (which infer label groups from intermediate activation) to 0%, decreases inversion-reconstruction (which recovers training data) similarity from 0.50 to 0.03, and limits model-extraction-based surrogates to about 1% accuracy, comparable to random guessing.

Client-Cooperative Split Learning

TL;DR

Abstract

Paper Structure (34 sections, 1 theorem, 12 equations, 4 figures, 7 tables, 3 algorithms)

This paper contains 34 sections, 1 theorem, 12 equations, 4 figures, 7 tables, 3 algorithms.

Introduction
Technical Warm-ups
Split Learning Architectures
Watermarking
System Model
Entities
Operational Setting and Deployment Context.
Threat Models.
Design of CliCooper
SL Protocol
Secret-mapping Expansion and DP-based Protection
Secret-mapping label expansion
DP-protected activations
Training with Chained Watermarking
Pipeline training
...and 19 more sections

Key Result

Theorem 1

Let $\bar{A}(x)=\textsf{Clip}^{(1)}_{S}\!(\mathbb{M}_{\mathcal{C}}(W_{\mathcal{C}},x))\in\mathbb{R}^d$ be the per-sample activation after per-layer $\ell_{1}$-clipping with radius $S$. The $\ell_{1}$-sensitivity satisfies $\Delta_{1}=\sup_{x\sim x'}\|\bar{A}(x)-\bar{A}(x')\|_{1}\le 2S$. Consider the where $\,t\ \text{generated }(y_a \wedge y_b)\,$ denotes the event that both DP activations $y_a$ a

Figures (4)

Figure 1: CliCooper design: The data client $\mathcal{C}$ holds the dataset but has very limited compute and thus collaborates with multiple trainer clients $\mathcal{T}$ for split learning. First, to disclose the ground-truth labels $Y$, $\mathcal{C}$ expands $Y$ to pseudo-labels $Y^*$ whose quantity and semantics differ from $Y$. Next, $\mathcal{C}$ uses a lightweight encoder to embed features and applies DP to protect activations. $\mathcal{T}$ then train the model using DP-protected activations $\mathbb{M}_{\mathcal{C}}^{\text{\tiny DP}}(W_{\mathcal{C}},\mathbb{D}^*)$ and $Y^*$ for $N$ epochs until convergence. In the $N{+}1$th epoch, each trainer $\mathcal{T}$ embeds a chained watermark into its assigned subnetwork in a pipelined manner, asserting ownership claim for their contributions to access compensation.
Figure 2: Comparison of the main accuracy with baseline across different architectures and datasets, where $\varepsilon$=5.0, $\gamma$=2.0, $B$=512/1024/2048.
Figure 3: Comparison of the main accuracy with baseline across different architectures and datasets, where $\varepsilon$=5.0, $B$=1024, $\gamma$=1.5/2.0/2.5.
Figure 4: Comparison of the main accuracy with baseline across different architectures and datasets, where $B$=1024, $\gamma$=2.0, $\varepsilon$=2.0/5.0/10.0.

Theorems & Definitions (2)

Theorem 1: DP-protected activations
Proof

Client-Cooperative Split Learning

TL;DR

Abstract

Client-Cooperative Split Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (2)