Split Learning without Local Weight Sharing to Enhance Client-side Data Privacy
Ngoc Duy Pham, Tran Khoa Phan, Alsharif Abuadbba, Yansong Gao, Doan Nguyen, Naveen Chilamkurti
TL;DR
This work interrogates privacy vulnerabilities in vanilla split learning arising from local weight sharing among clients. It introduces privacy-enhanced split learning (P-SL), which eliminates client-side weight sharing while maintaining server collaboration, and demonstrates up to ~50% reduction in client-side data leakage. The authors further extend P-SL with parallel server instances for speed and a cache-based mechanism to mitigate forgetting when late-arriving clients join, showing comparable accuracy to baseline SL/SFL across data distributions. The proposed approach is particularly relevant for IoT/mobile edge environments, offering a practical privacy-accuracy balance and scalable deployment in dynamic collaborative learning scenarios.
Abstract
Split learning (SL) aims to protect user data privacy by distributing deep models between client-server and keeping private data locally. In SL training with multiple clients, the local model weights are shared among the clients for local model update. This paper first reveals data privacy leakage exacerbated from local weight sharing among the clients in SL through model inversion attacks. Then, to reduce the data privacy leakage issue, we propose and analyze privacy-enhanced SL (P-SL) (or SL without local weight sharing). We further propose parallelized P-SL to expedite the training process by duplicating multiple server-side model instances without compromising accuracy. Finally, we explore P-SL with late participating clients and devise a server-side cache-based training method to address the forgetting phenomenon in SL when late clients join. Experimental results demonstrate that P-SL helps reduce up to 50% of client-side data leakage, which essentially achieves a better privacy-accuracy trade-off than the current trend by using differential privacy mechanisms. Moreover, P-SL and its cache-based version achieve comparable accuracy to baseline SL under various data distributions, while cost less computation and communication. Additionally, caching-based training in P-SL mitigates the negative effect of forgetting, stabilizes the learning, and enables practical and low-complexity training in a dynamic environment with late-arriving clients.
