Siamese Networks with Soft Labels for Unsupervised Lesion Detection and Patch Pretraining on Screening Mammograms
Kevin Van Vorst, Li Shen
TL;DR
This work tackles the challenge of pretraining classifiers for mammography when labeled data are scarce by exploiting bilateral symmetry to form unlabeled patch pairs. It introduces a Siamese network with soft labels derived from a Gaussian mixture model on patch-embedding distances, and employs a dual-network setup with cross-regularized losses to stabilize training. Across VinDr and OPTIMAM datasets, the approach yields superior or competitive performance on downstream tasks such as abnormal vs normal patch classification, BI-RADS classification, and outcome prediction, compared to standard self-supervised baselines. The method reduces reliance on costly annotations and demonstrates potential for scalable, unsupervised pretraining in medical imaging with practical implications for CAD tools and early lesion detection.
Abstract
Self-supervised learning has become a popular way to pretrain a deep learning model and then transfer it to perform downstream tasks. However, most of these methods are developed on large-scale image datasets that contain natural objects with clear textures, outlines, and distinct color contrasts. It remains uncertain whether these methods are equally effective for medical imaging, where the regions of interest often blend subtly and indistinctly with the surrounding tissues. In this study, we propose an alternative method that uses contralateral mammograms to train a neural network to encode similar embeddings when a pair contains both normal images and different embeddings when a pair contains normal and abnormal images. Our approach leverages the natural symmetry of human body as weak labels to learn to distinguish abnormal lesions from background tissues in a fully unsupervised manner. Our findings suggest that it's feasible by incorporating soft labels derived from the Euclidean distances between the embeddings of the image pairs into the Siamese network loss. Our method demonstrates superior performance in mammogram patch classification compared to existing self-supervised learning methods. This approach not only leverages a vast amount of image data effectively but also minimizes reliance on costly labels, a significant advantage particularly in the field of medical imaging.
