Evaluating Privacy Leakage in Split Learning
Xinchi Qiu, Ilias Leontiadis, Luca Melis, Alex Sablayrolles, Pierre Stock
TL;DR
The paper investigates privacy leakage in Split Learning by developing an exhaustive gradient-based attack (EXACT) that reconstructs private client features and labels from cut-layer gradients, activations, and server-side information. Across three tabular datasets, EXACT reveals strong leakage in unmitigated SL/FSL, including near-perfect label reconstruction and high-fidelity private-feature recovery. The authors demonstrate that small amounts of differential privacy, applied to the cut-layer gradients (e.g., DP-SGD with light noise) can substantially mitigate leakage with only minor degradations in model performance, while Label DP offers limited protection for private features. These findings highlight the practical privacy risks in SL and provide concrete mitigation guidance for deploying on-device learning with split architectures in privacy-conscious applications.
Abstract
Privacy-Preserving machine learning (PPML) can help us train and deploy models that utilize private information. In particular, on-device machine learning allows us to avoid sharing raw data with a third-party server during inference. On-device models are typically less accurate when compared to their server counterparts due to the fact that (1) they typically only rely on a small set of on-device features and (2) they need to be small enough to run efficiently on end-user devices. Split Learning (SL) is a promising approach that can overcome these limitations. In SL, a large machine learning model is divided into two parts, with the bigger part residing on the server side and a smaller part executing on-device, aiming to incorporate the private features. However, end-to-end training of such models requires exchanging gradients at the cut layer, which might encode private features or labels. In this paper, we provide insights into potential privacy risks associated with SL. Furthermore, we also investigate the effectiveness of various mitigation strategies. Our results indicate that the gradients significantly improve the attackers' effectiveness in all tested datasets reaching almost perfect reconstruction accuracy for some features. However, a small amount of differential privacy (DP) can effectively mitigate this risk without causing significant training degradation.
