Table of Contents
Fetching ...

Siamese Networks with Soft Labels for Unsupervised Lesion Detection and Patch Pretraining on Screening Mammograms

Kevin Van Vorst, Li Shen

TL;DR

This work tackles the challenge of pretraining classifiers for mammography when labeled data are scarce by exploiting bilateral symmetry to form unlabeled patch pairs. It introduces a Siamese network with soft labels derived from a Gaussian mixture model on patch-embedding distances, and employs a dual-network setup with cross-regularized losses to stabilize training. Across VinDr and OPTIMAM datasets, the approach yields superior or competitive performance on downstream tasks such as abnormal vs normal patch classification, BI-RADS classification, and outcome prediction, compared to standard self-supervised baselines. The method reduces reliance on costly annotations and demonstrates potential for scalable, unsupervised pretraining in medical imaging with practical implications for CAD tools and early lesion detection.

Abstract

Self-supervised learning has become a popular way to pretrain a deep learning model and then transfer it to perform downstream tasks. However, most of these methods are developed on large-scale image datasets that contain natural objects with clear textures, outlines, and distinct color contrasts. It remains uncertain whether these methods are equally effective for medical imaging, where the regions of interest often blend subtly and indistinctly with the surrounding tissues. In this study, we propose an alternative method that uses contralateral mammograms to train a neural network to encode similar embeddings when a pair contains both normal images and different embeddings when a pair contains normal and abnormal images. Our approach leverages the natural symmetry of human body as weak labels to learn to distinguish abnormal lesions from background tissues in a fully unsupervised manner. Our findings suggest that it's feasible by incorporating soft labels derived from the Euclidean distances between the embeddings of the image pairs into the Siamese network loss. Our method demonstrates superior performance in mammogram patch classification compared to existing self-supervised learning methods. This approach not only leverages a vast amount of image data effectively but also minimizes reliance on costly labels, a significant advantage particularly in the field of medical imaging.

Siamese Networks with Soft Labels for Unsupervised Lesion Detection and Patch Pretraining on Screening Mammograms

TL;DR

This work tackles the challenge of pretraining classifiers for mammography when labeled data are scarce by exploiting bilateral symmetry to form unlabeled patch pairs. It introduces a Siamese network with soft labels derived from a Gaussian mixture model on patch-embedding distances, and employs a dual-network setup with cross-regularized losses to stabilize training. Across VinDr and OPTIMAM datasets, the approach yields superior or competitive performance on downstream tasks such as abnormal vs normal patch classification, BI-RADS classification, and outcome prediction, compared to standard self-supervised baselines. The method reduces reliance on costly annotations and demonstrates potential for scalable, unsupervised pretraining in medical imaging with practical implications for CAD tools and early lesion detection.

Abstract

Self-supervised learning has become a popular way to pretrain a deep learning model and then transfer it to perform downstream tasks. However, most of these methods are developed on large-scale image datasets that contain natural objects with clear textures, outlines, and distinct color contrasts. It remains uncertain whether these methods are equally effective for medical imaging, where the regions of interest often blend subtly and indistinctly with the surrounding tissues. In this study, we propose an alternative method that uses contralateral mammograms to train a neural network to encode similar embeddings when a pair contains both normal images and different embeddings when a pair contains normal and abnormal images. Our approach leverages the natural symmetry of human body as weak labels to learn to distinguish abnormal lesions from background tissues in a fully unsupervised manner. Our findings suggest that it's feasible by incorporating soft labels derived from the Euclidean distances between the embeddings of the image pairs into the Siamese network loss. Our method demonstrates superior performance in mammogram patch classification compared to existing self-supervised learning methods. This approach not only leverages a vast amount of image data effectively but also minimizes reliance on costly labels, a significant advantage particularly in the field of medical imaging.
Paper Structure (20 sections, 7 equations, 6 figures, 4 tables)

This paper contains 20 sections, 7 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Two parallel networks, with shared weights $\theta$, process a pair of patches and return embeddings $e_1$ and $e_2$. The Euclidean distance, $D=d(e_1,e_2)$, is calculated. A two-component GMM is fit on the $D$ from the entire training set to get $P$. The two embeddings are concatenated to single vector $e$ and passed through a fully connected layer with sigmoid activation to get $q$.
  • Figure 2: A pair of mammogram patches is encoded by two separate Siamese networks resulting in normal pair probabilities $q_1$ and $q_2$ and Euclidean distances $D_1$ and $D_2$. The two Euclidean distances are used in their respective GMMs to get the soft labels $P_1$ and $P_2$.
  • Figure 3: Example patch pairs of size $256\times256$ sampled from the VinDr dataset
  • Figure 4: The GMMs at the conclusion of training on the VinDr-256 paired patch dataset. Provided are some examples of abnormal patch pairs that are correctly identified and normal patch pairs that are incorrectly identified as abnormal by the model. The posterior probability $P$, Siamese network similarity probability $q$, and patch pair Euclidean distance $D$ of each network for the patch pairs are shown.
  • Figure 5: t-SNE plots of 10,000 samples in the VinDr-256, VinDr-96, OPTIMAM-256, and OPTIMAM-96 paired patch datasets labeled by the proportion of abnormal area $A$.
  • ...and 1 more figures