SplitNN-driven Vertical Partitioning
Iker Ceballos, Vivek Sharma, Eduardo Mugica, Abhishek Singh, Alberto Roman, Praneeth Vepakomma, Ramesh Raskar
TL;DR
This work extends SplitNN to vertically partitioned data, enabling privacy-preserving collaboration across institutions that hold different feature sets. It systematically compares multiple output-merging strategies, highlighting max pooling as generally performing best while average pooling enables secure aggregation with minor trade-offs. Experiments on Bank Marketing, Give Me Credit, and Financial PhraseBank demonstrate that vertically partitioned training can approach centralized performance, albeit with sensitivity to stragglers and missing client outputs. The study provides practical insights into communication and computation costs and outlines directions for improving robustness and efficiency in privacy-preserving, vertically distributed learning.
Abstract
In this work, we introduce SplitNN-driven Vertical Partitioning, a configuration of a distributed deep learning method called SplitNN to facilitate learning from vertically distributed features. SplitNN does not share raw data or model details with collaborating institutions. The proposed configuration allows training among institutions holding diverse sources of data without the need of complex encryption algorithms or secure computation protocols. We evaluate several configurations to merge the outputs of the split models, and compare performance and resource efficiency. The method is flexible and allows many different configurations to tackle the specific challenges posed by vertically split datasets.
