Table of Contents
Fetching ...

SplitOut: Out-of-the-Box Training-Hijacking Detection in Split Learning via Outlier Detection

Ege Erdogan, Unat Teksen, Mehmet Salih Celiktenyildiz, Alptekin Kupcu, A. Ercument Cicek

TL;DR

SplitOut addresses training-hijacking in SplitNN by using an out-of-the-box Local Outlier Factor (LOF) detector applied to gradients gathered during a brief client training phase on a small data fraction (e.g., $1\%$). The method, enhanced by a window-based decision rule, requires minimal hyperparameter tuning and leverages the intrinsic divergence between honest and attack-driven gradient neighborhoods to achieve near-zero false positives across multiple datasets and attack variants, including adaptive multitask attackers. It demonstrates strong detection performance on MNIST, Fashion-MNIST, and CIFAR datasets, with robustness to adaptive strategies and the ability to complement existing defenses like SplitGuard. The work highlights a practical, proactive defense for privacy-preserving split learning and outlines limitations and avenues for future work, such as expanding beyond feature-space alignment attacks and addressing later-epoch attacks.

Abstract

Split learning enables efficient and privacy-aware training of a deep neural network by splitting a neural network so that the clients (data holders) compute the first layers and only share the intermediate output with the central compute-heavy server. This paradigm introduces a new attack medium in which the server has full control over what the client models learn, which has already been exploited to infer the private data of clients and to implement backdoors in the client models. Although previous work has shown that clients can successfully detect such training-hijacking attacks, the proposed methods rely on heuristics, require tuning of many hyperparameters, and do not fully utilize the clients' capabilities. In this work, we show that given modest assumptions regarding the clients' compute capabilities, an out-of-the-box outlier detection method can be used to detect existing training-hijacking attacks with almost-zero false positive rates. We conclude through experiments on different tasks that the simplicity of our approach we name \textit{SplitOut} makes it a more viable and reliable alternative compared to the earlier detection methods.

SplitOut: Out-of-the-Box Training-Hijacking Detection in Split Learning via Outlier Detection

TL;DR

SplitOut addresses training-hijacking in SplitNN by using an out-of-the-box Local Outlier Factor (LOF) detector applied to gradients gathered during a brief client training phase on a small data fraction (e.g., ). The method, enhanced by a window-based decision rule, requires minimal hyperparameter tuning and leverages the intrinsic divergence between honest and attack-driven gradient neighborhoods to achieve near-zero false positives across multiple datasets and attack variants, including adaptive multitask attackers. It demonstrates strong detection performance on MNIST, Fashion-MNIST, and CIFAR datasets, with robustness to adaptive strategies and the ability to complement existing defenses like SplitGuard. The work highlights a practical, proactive defense for privacy-preserving split learning and outlines limitations and avenues for future work, such as expanding beyond feature-space alignment attacks and addressing later-epoch attacks.

Abstract

Split learning enables efficient and privacy-aware training of a deep neural network by splitting a neural network so that the clients (data holders) compute the first layers and only share the intermediate output with the central compute-heavy server. This paradigm introduces a new attack medium in which the server has full control over what the client models learn, which has already been exploited to infer the private data of clients and to implement backdoors in the client models. Although previous work has shown that clients can successfully detect such training-hijacking attacks, the proposed methods rely on heuristics, require tuning of many hyperparameters, and do not fully utilize the clients' capabilities. In this work, we show that given modest assumptions regarding the clients' compute capabilities, an out-of-the-box outlier detection method can be used to detect existing training-hijacking attacks with almost-zero false positive rates. We conclude through experiments on different tasks that the simplicity of our approach we name \textit{SplitOut} makes it a more viable and reliable alternative compared to the earlier detection methods.
Paper Structure (24 sections, 7 equations, 6 figures, 11 tables, 1 algorithm)

This paper contains 24 sections, 7 equations, 6 figures, 11 tables, 1 algorithm.

Figures (6)

  • Figure 1: Potential SplitNN setups. Arrows denote the forward and backward passes, starting with the input data $X$, and propagating backwards after the loss computation using the labels $Y$. In Figure \ref{['fig:splitnn_label_sharing']}, clients send the labels to the server along with the intermediate outputs. In Figure \ref{['fig:splitnn_private_labels']}, the model terminates on the client side, and thus the clients do not share their labels.
  • Figure 1: T-SNE van2008visualizing dimension reduction comparing honest and malicious (FSHA) gradients obtained from a randomly chosen run of the first and second epoch of training on CIFAR10.
  • Figure 2: Results obtained by attackers for the MNIST, F-MNIST, CIFAR10, and CIFAR100 datasets with respect to the detection times (as shown in Table \ref{['tab:detection_results_v2']}). The first row displays the original images, and the last row displays the results obtained by a FSHA pasquini_unleashing_2021 attacker able to run for an arbitrary duration without being detected.
  • Figure 3: Results the FSHA attacker obtains when it performs multitask learning until detection for the MNIST, F-MNIST, CIFAR10, and CIFAR100 datasets, returning the gradients resulting from the average of honest training and adversarial objectives. The bottom row displays the original inputs.
  • Figure 4: T-SNE van2008visualizing dimension reduction comparing honest and malicious (FSHA & the backdoor attack) gradients obtained from a randomly chosen run of the first epoch of training on CIFAR10.
  • ...and 1 more figures