Split learning for health: Distributed deep learning without sharing raw patient data
Praneeth Vepakomma, Otkrist Gupta, Tristan Swedish, Ramesh Raskar
TL;DR
The paper addresses privacy-preserving collaborative learning in healthcare by introducing and detailing SplitNN, a distributed deep learning framework that keeps raw patient data local while sharing activations and gradients. It outlines multiple configurations to handle vertically partitioned, multi-modal, and label-sensitive scenarios, and demonstrates that SplitNN can achieve high accuracy with substantially reduced client-side computation and favorable bandwidth characteristics compared to federated learning and large-batch SGD. The work emphasizes practical applicability in hospitals and edge environments, and suggests future directions including more configurations and integration with model compression. Overall, SplitNN offers a flexible, resource-efficient approach for privacy-conscious health analytics across institutions.
Abstract
Can health entities collaboratively train deep learning models without sharing sensitive raw data? This paper proposes several configurations of a distributed deep learning method called SplitNN to facilitate such collaborations. SplitNN does not share raw data or model details with collaborating institutions. The proposed configurations of splitNN cater to practical settings of i) entities holding different modalities of patient data, ii) centralized and local health entities collaborating on multiple tasks and iii) learning without sharing labels. We compare performance and resource efficiency trade-offs of splitNN and other distributed deep learning methods like federated learning, large batch synchronous stochastic gradient descent and show highly encouraging results for splitNN.
