Table of Contents
Fetching ...

SplitNN-driven Vertical Partitioning

Iker Ceballos, Vivek Sharma, Eduardo Mugica, Abhishek Singh, Alberto Roman, Praneeth Vepakomma, Ramesh Raskar

TL;DR

This work extends SplitNN to vertically partitioned data, enabling privacy-preserving collaboration across institutions that hold different feature sets. It systematically compares multiple output-merging strategies, highlighting max pooling as generally performing best while average pooling enables secure aggregation with minor trade-offs. Experiments on Bank Marketing, Give Me Credit, and Financial PhraseBank demonstrate that vertically partitioned training can approach centralized performance, albeit with sensitivity to stragglers and missing client outputs. The study provides practical insights into communication and computation costs and outlines directions for improving robustness and efficiency in privacy-preserving, vertically distributed learning.

Abstract

In this work, we introduce SplitNN-driven Vertical Partitioning, a configuration of a distributed deep learning method called SplitNN to facilitate learning from vertically distributed features. SplitNN does not share raw data or model details with collaborating institutions. The proposed configuration allows training among institutions holding diverse sources of data without the need of complex encryption algorithms or secure computation protocols. We evaluate several configurations to merge the outputs of the split models, and compare performance and resource efficiency. The method is flexible and allows many different configurations to tackle the specific challenges posed by vertically split datasets.

SplitNN-driven Vertical Partitioning

TL;DR

This work extends SplitNN to vertically partitioned data, enabling privacy-preserving collaboration across institutions that hold different feature sets. It systematically compares multiple output-merging strategies, highlighting max pooling as generally performing best while average pooling enables secure aggregation with minor trade-offs. Experiments on Bank Marketing, Give Me Credit, and Financial PhraseBank demonstrate that vertically partitioned training can approach centralized performance, albeit with sensitivity to stragglers and missing client outputs. The study provides practical insights into communication and computation costs and outlines directions for improving robustness and efficiency in privacy-preserving, vertically distributed learning.

Abstract

In this work, we introduce SplitNN-driven Vertical Partitioning, a configuration of a distributed deep learning method called SplitNN to facilitate learning from vertically distributed features. SplitNN does not share raw data or model details with collaborating institutions. The proposed configuration allows training among institutions holding diverse sources of data without the need of complex encryption algorithms or secure computation protocols. We evaluate several configurations to merge the outputs of the split models, and compare performance and resource efficiency. The method is flexible and allows many different configurations to tackle the specific challenges posed by vertically split datasets.

Paper Structure

This paper contains 9 sections, 2 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Vertical SplitNN architecture: Each client computes a fixed portion of the computation graph and passes it to the server which computes the rest and performs back-propagation and returns back the jacobians to the client which can perform their respective back-propagation.
  • Figure 2: Comparison of several merging strategies for SplitNN-driven vertical partitioning with PhraseBank.
  • Figure 3: Loss and metrics for PhraseBank dataset while workers drop during training.