Table of Contents
Fetching ...

Synthetic Privileged Information Enhances Medical Image Representation Learning

Lucas Farndale, Chris Walsh, Robert Insall, Ke Yuan

TL;DR

This work demonstrates that representation learning can be significantly improved by synthetically generating paired information, both compared to training on either single-modality or authentic multi-modal paired datasets.

Abstract

Multimodal self-supervised representation learning has consistently proven to be a highly effective method in medical image analysis, offering strong task performance and producing biologically informed insights. However, these methods heavily rely on large, paired datasets, which is prohibitive for their use in scenarios where paired data does not exist, or there is only a small amount available. In contrast, image generation methods can work well on very small datasets, and can find mappings between unpaired datasets, meaning an effectively unlimited amount of paired synthetic data can be generated. In this work, we demonstrate that representation learning can be significantly improved by synthetically generating paired information, both compared to training on either single-modality (up to 4.4x error reduction) or authentic multi-modal paired datasets (up to 5.6x error reduction).

Synthetic Privileged Information Enhances Medical Image Representation Learning

TL;DR

This work demonstrates that representation learning can be significantly improved by synthetically generating paired information, both compared to training on either single-modality or authentic multi-modal paired datasets.

Abstract

Multimodal self-supervised representation learning has consistently proven to be a highly effective method in medical image analysis, offering strong task performance and producing biologically informed insights. However, these methods heavily rely on large, paired datasets, which is prohibitive for their use in scenarios where paired data does not exist, or there is only a small amount available. In contrast, image generation methods can work well on very small datasets, and can find mappings between unpaired datasets, meaning an effectively unlimited amount of paired synthetic data can be generated. In this work, we demonstrate that representation learning can be significantly improved by synthetically generating paired information, both compared to training on either single-modality (up to 4.4x error reduction) or authentic multi-modal paired datasets (up to 5.6x error reduction).
Paper Structure (9 sections, 4 figures, 2 tables)

This paper contains 9 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: (a) Schematic of synthetic paired data being generated and passed to the self-supervised model (b) Comparison of standard multimodal self-supervised approaches against our proposed method using synthetically generated data. Images from burlingame2020shift.
  • Figure 2: (a) Guided-GradCAM selvaraju2017grad maps for samples from the NCT dataset. Predicted labels for each classifier are shown, as is the true label. Labels are adipose (ADI), background (BACK), debris (DEB), lymphocytes (LYM), mucus (MUC), muscle (MUS), normal mucosa (NORM), stroma (STR), and tumour (TUM). See Figure \ref{['fig:full-guided-gradcam']} for examples of all classes; (b) Breakdown of performance on the NCT evaluation task by class; (c) Results for models trained on the NCT dataset with synthetically generated privileged information as pairs for the NCT patches. Values marked with "†" are from farndale2023trident; (d) Classification performance on NCT and Camelyon of models trained on the PanNuke/IHC/SHIFT datasets compared to the synthetically generated images paired with the NCT dataset.
  • Figure 3: (a) Evaluation of representations from models trained on the real PanNuke dataset and the synthetically generated nuclear segmentations paired with the NCT dataset. (b) UMAP projections, coloured by PanNuke label.
  • Figure S1: Representative samples from each NCT tissue class with Guided-GradCAM activation maps. True and predicted labels are shown for each classifier.