Table of Contents
Fetching ...

Sharing Knowledge without Sharing Data: Stitches can improve ensembles of disjointly trained models

Arthur Guijt, Dirk Thierens, Ellen Kerkhof, Jan Wiersma, Tanja Alderliesten, Peter A. N. Bosman

TL;DR

The paper addresses data fragmentation in sensitive domains like medicine by exploring asynchronous collaboration through sharing trained models rather than raw data. It introduces stitch-ensembles, training stitching layers to translate intermediate representations so independently trained networks can be combined, even when architectures differ. A novel double-batched stitching training method is proposed to improve robustness when multiple stitches are used, and ensemble averaging at stitching points further enhances performance. Across two dataset pairs (MRI organ-at-risk vs. polyp segmentation), stitching approaches nearly close the gap to data-sharing and federated learning, demonstrating practical potential for privacy-preserving collaboration and improved generalization.

Abstract

Deep learning has been shown to be very capable at performing many real-world tasks. However, this performance is often dependent on the presence of large and varied datasets. In some settings, like in the medical domain, data is often fragmented across parties, and cannot be readily shared. While federated learning addresses this situation, it is a solution that requires synchronicity of parties training a single model together, exchanging information about model weights. We investigate how asynchronous collaboration, where only already trained models are shared (e.g. as part of a publication), affects performance, and propose to use stitching as a method for combining models. Through taking a multi-objective perspective, where performance on each parties' data is viewed independently, we find that training solely on a single parties' data results in similar performance when merging with another parties' data, when considering performance on that single parties' data, while performance on other parties' data is notably worse. Moreover, while an ensemble of such individually trained networks generalizes better, performance on each parties' own dataset suffers. We find that combining intermediate representations in individually trained models with a well placed pair of stitching layers allows this performance to recover to a competitive degree while maintaining improved generalization, showing that asynchronous collaboration can yield competitive results.

Sharing Knowledge without Sharing Data: Stitches can improve ensembles of disjointly trained models

TL;DR

The paper addresses data fragmentation in sensitive domains like medicine by exploring asynchronous collaboration through sharing trained models rather than raw data. It introduces stitch-ensembles, training stitching layers to translate intermediate representations so independently trained networks can be combined, even when architectures differ. A novel double-batched stitching training method is proposed to improve robustness when multiple stitches are used, and ensemble averaging at stitching points further enhances performance. Across two dataset pairs (MRI organ-at-risk vs. polyp segmentation), stitching approaches nearly close the gap to data-sharing and federated learning, demonstrating practical potential for privacy-preserving collaboration and improved generalization.

Abstract

Deep learning has been shown to be very capable at performing many real-world tasks. However, this performance is often dependent on the presence of large and varied datasets. In some settings, like in the medical domain, data is often fragmented across parties, and cannot be readily shared. While federated learning addresses this situation, it is a solution that requires synchronicity of parties training a single model together, exchanging information about model weights. We investigate how asynchronous collaboration, where only already trained models are shared (e.g. as part of a publication), affects performance, and propose to use stitching as a method for combining models. Through taking a multi-objective perspective, where performance on each parties' data is viewed independently, we find that training solely on a single parties' data results in similar performance when merging with another parties' data, when considering performance on that single parties' data, while performance on other parties' data is notably worse. Moreover, while an ensemble of such individually trained networks generalizes better, performance on each parties' own dataset suffers. We find that combining intermediate representations in individually trained models with a well placed pair of stitching layers allows this performance to recover to a competitive degree while maintaining improved generalization, showing that asynchronous collaboration can yield competitive results.

Paper Structure

This paper contains 53 sections, 4 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: A schematic representation of stitching.
  • Figure 2: A schematic representation of stitching.
  • Figure 3: The mean squared error between two values prior to and after a ReLU activation differs. Consequently, some differences are more important than others to the functioning of the neural network, while some can be ignored entirely.
  • Figure 4: When using stitching for combining networks, rather than selecting an intermediate representation, we use the average of the original and stitched representation instead. This was done by modifying each switch, including a third, averaged option between the two input feature maps.
  • Figure 5:
  • ...and 7 more figures