Privacy-Preserving Split Learning with Vision Transformers using Patch-Wise Random and Noisy CutMix

Seungeun Oh; Sihun Baek; Jihong Park; Hyelin Nam; Praneeth Vepakomma; Ramesh Raskar; Mehdi Bennis; Seong-Lyun Kim

Privacy-Preserving Split Learning with Vision Transformers using Patch-Wise Random and Noisy CutMix

Seungeun Oh, Sihun Baek, Jihong Park, Hyelin Nam, Praneeth Vepakomma, Ramesh Raskar, Mehdi Bennis, Seong-Lyun Kim

TL;DR

This work proposes a novel privacy-preserving SL framework that injects Gaussian noise into smashed data and mixes randomly chosen patches of smashed data across clients, coined DP-CutMixSL, and demonstrates that DP-CutMixSL is a differentially private (DP) mechanism that strengthens privacy protection against membership inference attacks during forward propagation.

Abstract

In computer vision, the vision transformer (ViT) has increasingly superseded the convolutional neural network (CNN) for improved accuracy and robustness. However, ViT's large model sizes and high sample complexity make it difficult to train on resource-constrained edge devices. Split learning (SL) emerges as a viable solution, leveraging server-side resources to train ViTs while utilizing private data from distributed devices. However, SL requires additional information exchange for weight updates between the device and the server, which can be exposed to various attacks on private training data. To mitigate the risk of data breaches in classification tasks, inspired from the CutMix regularization, we propose a novel privacy-preserving SL framework that injects Gaussian noise into smashed data and mixes randomly chosen patches of smashed data across clients, coined DP-CutMixSL. Our analysis demonstrates that DP-CutMixSL is a differentially private (DP) mechanism that strengthens privacy protection against membership inference attacks during forward propagation. Through simulations, we show that DP-CutMixSL improves privacy protection against membership inference attacks, reconstruction attacks, and label inference attacks, while also improving accuracy compared to DP-SL and DP-MixSL.

Privacy-Preserving Split Learning with Vision Transformers using Patch-Wise Random and Noisy CutMix

TL;DR

Abstract

Paper Structure (29 sections, 5 theorems, 17 equations, 11 figures, 8 tables, 1 algorithm)

This paper contains 29 sections, 5 theorems, 17 equations, 11 figures, 8 tables, 1 algorithm.

Introduction
Related Works
Federated & Split Learning.
Privacy Attacks & Differential Privacy.
Motivation: Privacy-Preserving Parallel SL in ViT
Proposed: Split Learning With Random CutMix for ViT
Differential Privacy Analysis on Smashed Data
Privacy-Accuracy Trade-Off ($k=n$).
RDP-CDP Conversion.
Revisiting Privacy-Accuracy Trade-Off ($k<n$).
Limitations on DP Analysis.
Numerical Evaluation
Privacy Against Membership Inference Attacks.
Privacy Against Reconstruction Attacks.
Privacy Against Label Inference Attacks.
...and 14 more sections

Key Result

Theorem 1

For a given order $\alpha\geq 2$, the RDP privacy budgets $\epsilon_1(\alpha)$, $\epsilon_2(\alpha)$, and $\epsilon_3(\alpha)$ of DP-SL, DP-MixSL and DP-CutMixSL satisfy the following inequality: where in which $\epsilon_{1,s}(\alpha) = \frac{\alpha \Delta^2 D_s}{2 \sigma^2_s},\, \epsilon_{1,y}(\alpha) = \frac{\alpha D_y}{2 \sigma^2_y},$ and $\lambda_{max} = \max_{i\in \mathbb{C}}{\lambda_i}.$ P

Figures (11)

Figure 1: Schematic illustration of DP-CutMixSL with 2 clients.
Figure 2: Comparison of CNN and ViT operation from various perspectives.
Figure 3: Structural comparison of (c) DP-CutMixSL with (a) Parallel SL (PSL) and (b) split federated learning (SFL) thapa2020splitfed. (1) In DP-CutMixSL, the mixer first calculates the $i$-th mixing ratio$\lambda_i$ following the symmetric dirichlet distribution bishop2007discrete with mask distribution $\alpha_M$. Depending on $\lambda_i$, the mixer creates the $i$-th mask $\mathbf{M}_i$, randomizing $\lceil \lambda_i\cdot N \rceil$ out of a total of $N$ patches. (2) The $i$-th client after the client-side FP punches the smashed data based on $\mathbf{M}_i$ and add Gaussian noise, which is then sent to the mixer. (3) The mixer consolidates the patch-wise randomly selected and noise-augmented smashed data received from clients, producing the smashed data of DP-CutMixSL with all patches intact. This is then transmitted to the server for the remaining SL operations including server-side forward propagation process.
Figure 4: An illustration of DP-CutMixSL with subsampling when $n=4$ and $k=3$.
Figure 5: Accuracy, attack success rate, and $\varepsilon$ under the CIFAR-10 dataset: (a) accuracy of DP-CutMixSL, DP-MixSL, and DP-SL according to $\varepsilon$; (b) attack success rate of membership inference attacks against DP-CutMixSL, DP-MixSL, and DP-SL according to $k$; (c) accuracy of DP-CutMixSL, DP-SL, and DP-MixSL according to $k$; (d) $\varepsilon$ of DP-CutMixSL, DP-SL, and DP-MixSL according to $k$.
...and 6 more figures

Theorems & Definitions (7)

Definition 1: ($\varepsilon$,$\delta$)-CDP
Definition 2: $(\alpha, \epsilon)$-RDP
Theorem 1
Corollary 1
Proposition 1
Proposition 2
Proposition 3

Privacy-Preserving Split Learning with Vision Transformers using Patch-Wise Random and Noisy CutMix

TL;DR

Abstract

Privacy-Preserving Split Learning with Vision Transformers using Patch-Wise Random and Noisy CutMix

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (7)