Table of Contents
Fetching ...

Split learning for health: Distributed deep learning without sharing raw patient data

Praneeth Vepakomma, Otkrist Gupta, Tristan Swedish, Ramesh Raskar

TL;DR

The paper addresses privacy-preserving collaborative learning in healthcare by introducing and detailing SplitNN, a distributed deep learning framework that keeps raw patient data local while sharing activations and gradients. It outlines multiple configurations to handle vertically partitioned, multi-modal, and label-sensitive scenarios, and demonstrates that SplitNN can achieve high accuracy with substantially reduced client-side computation and favorable bandwidth characteristics compared to federated learning and large-batch SGD. The work emphasizes practical applicability in hospitals and edge environments, and suggests future directions including more configurations and integration with model compression. Overall, SplitNN offers a flexible, resource-efficient approach for privacy-conscious health analytics across institutions.

Abstract

Can health entities collaboratively train deep learning models without sharing sensitive raw data? This paper proposes several configurations of a distributed deep learning method called SplitNN to facilitate such collaborations. SplitNN does not share raw data or model details with collaborating institutions. The proposed configurations of splitNN cater to practical settings of i) entities holding different modalities of patient data, ii) centralized and local health entities collaborating on multiple tasks and iii) learning without sharing labels. We compare performance and resource efficiency trade-offs of splitNN and other distributed deep learning methods like federated learning, large batch synchronous stochastic gradient descent and show highly encouraging results for splitNN.

Split learning for health: Distributed deep learning without sharing raw patient data

TL;DR

The paper addresses privacy-preserving collaborative learning in healthcare by introducing and detailing SplitNN, a distributed deep learning framework that keeps raw patient data local while sharing activations and gradients. It outlines multiple configurations to handle vertically partitioned, multi-modal, and label-sensitive scenarios, and demonstrates that SplitNN can achieve high accuracy with substantially reduced client-side computation and favorable bandwidth characteristics compared to federated learning and large-batch SGD. The work emphasizes practical applicability in hospitals and edge environments, and suggests future directions including more configurations and integration with model compression. Overall, SplitNN offers a flexible, resource-efficient approach for privacy-conscious health analytics across institutions.

Abstract

Can health entities collaboratively train deep learning models without sharing sensitive raw data? This paper proposes several configurations of a distributed deep learning method called SplitNN to facilitate such collaborations. SplitNN does not share raw data or model details with collaborating institutions. The proposed configurations of splitNN cater to practical settings of i) entities holding different modalities of patient data, ii) centralized and local health entities collaborating on multiple tasks and iii) learning without sharing labels. We compare performance and resource efficiency trade-offs of splitNN and other distributed deep learning methods like federated learning, large batch synchronous stochastic gradient descent and show highly encouraging results for splitNN.

Paper Structure

This paper contains 7 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Distributed learning over retinopathy images (or undetected fast moving threats) over slow bit-rate (‘snail-pace’), to detect the emerging threat by pooling their images but without exchanging raw patient data.
  • Figure 2: Split learning configurations for health shows raw data is not transferred between the client and server health entities for training and inference of distributed deep learning models with SplitNN.
  • Figure 3: We show dramatic reduction in computational burden (in tflops) while maintaining higher accuracies when training over large number of clients with splitNN. Blue line denotes distributed deep learning using splitNN, red line indicate federated averaging and green line indicates large batch SGD.
  • Figure 4: Split learning configurations for health shows raw data is not transferred between the client and server health entities for training and inference of distributed deep learning models with SplitNN.