A Comparison of Methods for Neural Network Aggregation
John Pomerat, Aviv Segev
TL;DR
This work addresses privacy concerns in joint neural network training across multiple organizations by adopting secure multi-party computation (MPC) to enable aggregation without exposing private data. It formalizes neural network aggregation and examines three methods—series network learning, average ensemble learning, and transfer learning—against a data-sharing baseline in both synthetic regression and a real breast cancer classification task. The results show that series network learning and transfer learning often match or surpass the data-sharing performance, highlighting their potential as privacy-preserving alternatives in healthcare ML; however, membership inference remains a security concern to be mitigated in future studies. The study demonstrates a practical path toward collaborative ML across private datasets with comparable model quality, underscoring both the promise and the challenges of scalable privacy-preserving neural network aggregation.
Abstract
Deep learning has been successful in the theoretical aspect. For deep learning to succeed in industry, we need to have algorithms capable of handling many inconsistencies appearing in real data. These inconsistencies can have large effects on the implementation of a deep learning algorithm. Artificial Intelligence is currently changing the medical industry. However, receiving authorization to use medical data for training machine learning algorithms is a huge hurdle. A possible solution is sharing the data without sharing the patient information. We propose a multi-party computation protocol for the deep learning algorithm. The protocol enables to conserve both the privacy and the security of the training data. Three approaches of neural networks assembly are analyzed: transfer learning, average ensemble learning, and series network learning. The results are compared to approaches based on data-sharing in different experiments. We analyze the security issues of the proposed protocol. Although the analysis is based on medical data, the results of multi-party computation of machine learning training are theoretical and can be implemented in multiple research areas.
