Sharing Knowledge without Sharing Data: Stitches can improve ensembles of disjointly trained models
Arthur Guijt, Dirk Thierens, Ellen Kerkhof, Jan Wiersma, Tanja Alderliesten, Peter A. N. Bosman
TL;DR
The paper addresses data fragmentation in sensitive domains like medicine by exploring asynchronous collaboration through sharing trained models rather than raw data. It introduces stitch-ensembles, training stitching layers to translate intermediate representations so independently trained networks can be combined, even when architectures differ. A novel double-batched stitching training method is proposed to improve robustness when multiple stitches are used, and ensemble averaging at stitching points further enhances performance. Across two dataset pairs (MRI organ-at-risk vs. polyp segmentation), stitching approaches nearly close the gap to data-sharing and federated learning, demonstrating practical potential for privacy-preserving collaboration and improved generalization.
Abstract
Deep learning has been shown to be very capable at performing many real-world tasks. However, this performance is often dependent on the presence of large and varied datasets. In some settings, like in the medical domain, data is often fragmented across parties, and cannot be readily shared. While federated learning addresses this situation, it is a solution that requires synchronicity of parties training a single model together, exchanging information about model weights. We investigate how asynchronous collaboration, where only already trained models are shared (e.g. as part of a publication), affects performance, and propose to use stitching as a method for combining models. Through taking a multi-objective perspective, where performance on each parties' data is viewed independently, we find that training solely on a single parties' data results in similar performance when merging with another parties' data, when considering performance on that single parties' data, while performance on other parties' data is notably worse. Moreover, while an ensemble of such individually trained networks generalizes better, performance on each parties' own dataset suffers. We find that combining intermediate representations in individually trained models with a well placed pair of stitching layers allows this performance to recover to a competitive degree while maintaining improved generalization, showing that asynchronous collaboration can yield competitive results.
