Metadata-enhanced contrastive learning from retinal optical coherence tomography images
Robbie Holland, Oliver Leingang, Hrvoje Bogunović, Sophie Riedl, Lars Fritsche, Toby Prevost, Hendrik P. N. Scholl, Ursula Schmidt-Erfurth, Sobha Sivaprasad, Andrew J. Lotery, Daniel Rueckert, Martin J. Menten
TL;DR
This work tackles the challenge of applying contrastive self-supervised learning to medical retinal OCT data by introducing a metadata-enhanced framework that leverages longitudinal patient information (identity, eye laterality, and scan timing) to define informative positive and negative pairs. By reconstituting inter-image relationships with a temporal window $\delta_T$ and excluding ambiguous cross-patient negatives, the authors adapt SimCLR and BYOL to OCT data and demonstrate substantial improvements over standard pretraining and a retinal foundation model across seven AMD-related downstream tasks in two large cohorts. The approach yields strong data-efficiency, with 20x–100x fewer labeled samples sometimes sufficient to match or exceed baseline performance, highlighting the practical potential for label-efficient retinal disease screening and monitoring. The results suggest a generalizable strategy for integrating readily available metadata into self-supervised learning in medical imaging and motivate extensions to other modalities and diseases.
Abstract
Deep learning has potential to automate screening, monitoring and grading of disease in medical images. Pretraining with contrastive learning enables models to extract robust and generalisable features from natural image datasets, facilitating label-efficient downstream image analysis. However, the direct application of conventional contrastive methods to medical datasets introduces two domain-specific issues. Firstly, several image transformations which have been shown to be crucial for effective contrastive learning do not translate from the natural image to the medical image domain. Secondly, the assumption made by conventional methods, that any two images are dissimilar, is systematically misleading in medical datasets depicting the same anatomy and disease. This is exacerbated in longitudinal image datasets that repeatedly image the same patient cohort to monitor their disease progression over time. In this paper we tackle these issues by extending conventional contrastive frameworks with a novel metadata-enhanced strategy. Our approach employs widely available patient metadata to approximate the true set of inter-image contrastive relationships. To this end we employ records for patient identity, eye position (i.e. left or right) and time series information. In experiments using two large longitudinal datasets containing 170,427 retinal OCT images of 7,912 patients with age-related macular degeneration (AMD), we evaluate the utility of using metadata to incorporate the temporal dynamics of disease progression into pretraining. Our metadata-enhanced approach outperforms both standard contrastive methods and a retinal image foundation model in five out of six image-level downstream tasks related to AMD. Due to its modularity, our method can be quickly and cost-effectively tested to establish the potential benefits of including available metadata in contrastive pretraining.
