Passive Inference Attacks on Split Learning via Adversarial Regularization
Xiaochen Zhu, Xinjian Luo, Yuncheng Wu, Yangfan Jiang, Xiaokui Xiao, Beng Chin Ooi
TL;DR
This work reveals a significant privacy vulnerability in Split Learning by introducing SDAR, a passive, GAN-inspired attack that trains a simulator and decoder to infer clients’ private data from intermediate representations. By incorporating adversarial regularization and leveraging an auxiliary dataset, SDAR generalizes to unseen client data and, in U-shaped SL, to both features and labels, outperforming prior passive attacks and approaching active hijacking performance on standard benchmarks like CIFAR-10 at deep splits. The study also demonstrates robustness to limited auxiliary data, architecture uncertainty, and mild data heterogeneity, while evaluating potential defenses and showing that many common protections (regularization, decorrelation, HE/MPC/DP) have limited effect without impacting utility. Overall, the results highlight practical privacy risks in SL and motivate more effective privacy-preserving mechanisms for real-world deployments. The findings have immediate implications for practitioners deploying SL, emphasizing careful architectural choices and the need for cryptographic or rigorous privacy guarantees to mitigate such passive inferences.
Abstract
Split Learning (SL) has emerged as a practical and efficient alternative to traditional federated learning. While previous attempts to attack SL have often relied on overly strong assumptions or targeted easily exploitable models, we seek to develop more capable attacks. We introduce SDAR, a novel attack framework against SL with an honest-but-curious server. SDAR leverages auxiliary data and adversarial regularization to learn a decodable simulator of the client's private model, which can effectively infer the client's private features under the vanilla SL, and both features and labels under the U-shaped SL. We perform extensive experiments in both configurations to validate the effectiveness of our proposed attacks. Notably, in challenging scenarios where existing passive attacks struggle to reconstruct the client's private data effectively, SDAR consistently achieves significantly superior attack performance, even comparable to active attacks. On CIFAR-10, at the deep split level of 7, SDAR achieves private feature reconstruction with less than 0.025 mean squared error in both the vanilla and the U-shaped SL, and attains a label inference accuracy of over 98% in the U-shaped setting, while existing attacks fail to produce non-trivial results.
