Table of Contents
Fetching ...

FISHing in Uncertainty: Synthetic Contrastive Learning for Genetic Aberration Detection

Simon Gutwein, Martin Kampel, Sabine Taschner-Mandl, Roxane Licandro

TL;DR

This work introduces a novel approach that leverages synthetic images to eliminate the requirement for manual annotations and utilizes a joint contrastive and classification objective for training to account for inter-class variation effectively.

Abstract

Detecting genetic aberrations is crucial in cancer diagnosis, typically through fluorescence in situ hybridization (FISH). However, existing FISH image classification methods face challenges due to signal variability, the need for costly manual annotations and fail to adequately address the intrinsic uncertainty. We introduce a novel approach that leverages synthetic images to eliminate the requirement for manual annotations and utilizes a joint contrastive and classification objective for training to account for inter-class variation effectively. We demonstrate the superior generalization capabilities and uncertainty calibration of our method, which is trained on synthetic data, by testing it on a manually annotated dataset of real-world FISH images. Our model offers superior calibration in terms of classification accuracy and uncertainty quantification with a classification accuracy of 96.7% among the 50% most certain cases. The presented end-to-end method reduces the demands on personnel and time and improves the diagnostic workflow due to its accuracy and adaptability. All code and data is publicly accessible at: https://github.com/SimonBon/FISHing

FISHing in Uncertainty: Synthetic Contrastive Learning for Genetic Aberration Detection

TL;DR

This work introduces a novel approach that leverages synthetic images to eliminate the requirement for manual annotations and utilizes a joint contrastive and classification objective for training to account for inter-class variation effectively.

Abstract

Detecting genetic aberrations is crucial in cancer diagnosis, typically through fluorescence in situ hybridization (FISH). However, existing FISH image classification methods face challenges due to signal variability, the need for costly manual annotations and fail to adequately address the intrinsic uncertainty. We introduce a novel approach that leverages synthetic images to eliminate the requirement for manual annotations and utilizes a joint contrastive and classification objective for training to account for inter-class variation effectively. We demonstrate the superior generalization capabilities and uncertainty calibration of our method, which is trained on synthetic data, by testing it on a manually annotated dataset of real-world FISH images. Our model offers superior calibration in terms of classification accuracy and uncertainty quantification with a classification accuracy of 96.7% among the 50% most certain cases. The presented end-to-end method reduces the demands on personnel and time and improves the diagnostic workflow due to its accuracy and adaptability. All code and data is publicly accessible at: https://github.com/SimonBon/FISHing

Paper Structure

This paper contains 7 sections, 3 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: FISH-image patches with stained cell nuclei in blue image channel for row A and B. A: HER2 (target) as red signals, CEN17q (reference) as green signals. B: Synthetic images of MYCN FISH, MYCN (target) as green signals, NMI (reference) as red signals, showing diagnostic classes: MYCN Normal ($c_N$), Gain ($c_G$), and Amplified ($c_A$) along real world image examples. n indicates the number of MYCN signals.
  • Figure 2: Configuration of signals $Q$ is input to FISHPainter, producing image $X$ and label $Y$. Image $X$ is augmented with transformations $t$ sampled from set $T$ to create views $X_i$ and $X_j$. These views are embedded into representation space $R$ and projected to $Z$. Contrastive loss is calculated on $Z$. $R$ is also used by the classification layer to predict $\hat{Y}$, on which CE loss is computed.
  • Figure 3: A: 2D latent space visualization of Ours Heavy on the synthetic training dataset (blue), the real world test dataset (yellow) and the OOD dataset (red). B: Comparison of model to human certainty when classifying synthetic FISH images with set number of green signals. C: Accuracy results and the distribution of remaining classes, conditioned on certainty (example: at 0.8 certainty $\sim78\%$ data remaining with $\sim93\%$ overall accuracy).
  • Figure 4: A: ECE, positive ECE, and negative ECE comparison between our method and all baseline approaches on real world data. B: Calibration chart for Ours Heavy, showing the sample distribution across certainty bins.
  • Figure S1: The experimental results are shown in subplots A-D for Resnet+CE, CL+Attached, CL+Detached, and Ours Light. Each subplot displays latent representations of the synthetic training dataset (blue), real-world test set (yellow), and OOD dataset (red). Below, the left plot shows ECE (green for underconfidence, red for overconfidence), and the right plot shows accuracy and class distribution across certainty thresholds. A data flow diagram below these plots illustrates the training methods for each model. The snowflake in C indicates frozen weights during fine-tuning.
  • ...and 2 more figures