Table of Contents
Fetching ...

Deep-learning-based clustering of OCT images for biomarker discovery in age-related macular degeneration (Pinnacle study report 4)

Robbie Holland, Rebecca Kaye, Ahmed M. Hagag, Oliver Leingang, Thomas R. P. Taylor, Hrvoje Bogunović, Ursula Schmidt-Erfurth, Hendrik P. N. Scholl, Daniel Rueckert, Andrew J. Lotery, Sobha Sivaprasad, Martin J. Menten

TL;DR

This study addresses the need for prognostic biomarkers beyond traditional AMD grading by leveraging self-supervised contrastive learning on a large OCT dataset. A ResNet-based feature extractor is trained without labels, and images are clustered into 30 groups; two retinal-specialist teams annotate cluster-defining features, revealing both known and novel AMD biomarkers. The clusters demonstrate fine-grained, AMD-related distinctions and, in simulation, offer improved prognostic value for progression to late AMD compared with established grading schemes, while remaining interpretable through GradCAM attributions. The findings suggest that discovery-oriented, data-driven tools can accelerate biomarker discovery and refinement, with potential applicability across diseases and imaging modalities, provided domain-specific retraining is performed.

Abstract

Diseases are currently managed by grading systems, where patients are stratified by grading systems into stages that indicate patient risk and guide clinical management. However, these broad categories typically lack prognostic value, and proposals for new biomarkers are currently limited to anecdotal observations. In this work, we introduce a deep-learning-based biomarker proposal system for the purpose of accelerating biomarker discovery in age-related macular degeneration (AMD). It works by first training a neural network using self-supervised contrastive learning to discover, without any clinical annotations, features relating to both known and unknown AMD biomarkers present in 46,496 retinal optical coherence tomography (OCT) images. To interpret the discovered biomarkers, we partition the images into 30 subsets, termed clusters, that contain similar features. We then conduct two parallel 1.5-hour semi-structured interviews with two independent teams of retinal specialists that describe each cluster in clinical language. Overall, both teams independently identified clearly distinct characteristics in 27 of 30 clusters, of which 23 were related to AMD. Seven were recognised as known biomarkers already used in established grading systems and 16 depicted biomarker combinations or subtypes that are either not yet used in grading systems, were only recently proposed, or were unknown. Clusters separated incomplete from complete retinal atrophy, intraretinal from subretinal fluid and thick from thin choroids, and in simulation outperformed clinically-used grading systems in prognostic value. Overall, contrastive learning enabled the automatic proposal of AMD biomarkers that go beyond the set used by clinically established grading systems. Ultimately, we envision that equipping clinicians with discovery-oriented deep-learning tools can accelerate discovery of novel prognostic biomarkers.

Deep-learning-based clustering of OCT images for biomarker discovery in age-related macular degeneration (Pinnacle study report 4)

TL;DR

This study addresses the need for prognostic biomarkers beyond traditional AMD grading by leveraging self-supervised contrastive learning on a large OCT dataset. A ResNet-based feature extractor is trained without labels, and images are clustered into 30 groups; two retinal-specialist teams annotate cluster-defining features, revealing both known and novel AMD biomarkers. The clusters demonstrate fine-grained, AMD-related distinctions and, in simulation, offer improved prognostic value for progression to late AMD compared with established grading schemes, while remaining interpretable through GradCAM attributions. The findings suggest that discovery-oriented, data-driven tools can accelerate biomarker discovery and refinement, with potential applicability across diseases and imaging modalities, provided domain-specific retraining is performed.

Abstract

Diseases are currently managed by grading systems, where patients are stratified by grading systems into stages that indicate patient risk and guide clinical management. However, these broad categories typically lack prognostic value, and proposals for new biomarkers are currently limited to anecdotal observations. In this work, we introduce a deep-learning-based biomarker proposal system for the purpose of accelerating biomarker discovery in age-related macular degeneration (AMD). It works by first training a neural network using self-supervised contrastive learning to discover, without any clinical annotations, features relating to both known and unknown AMD biomarkers present in 46,496 retinal optical coherence tomography (OCT) images. To interpret the discovered biomarkers, we partition the images into 30 subsets, termed clusters, that contain similar features. We then conduct two parallel 1.5-hour semi-structured interviews with two independent teams of retinal specialists that describe each cluster in clinical language. Overall, both teams independently identified clearly distinct characteristics in 27 of 30 clusters, of which 23 were related to AMD. Seven were recognised as known biomarkers already used in established grading systems and 16 depicted biomarker combinations or subtypes that are either not yet used in grading systems, were only recently proposed, or were unknown. Clusters separated incomplete from complete retinal atrophy, intraretinal from subretinal fluid and thick from thin choroids, and in simulation outperformed clinically-used grading systems in prognostic value. Overall, contrastive learning enabled the automatic proposal of AMD biomarkers that go beyond the set used by clinically established grading systems. Ultimately, we envision that equipping clinicians with discovery-oriented deep-learning tools can accelerate discovery of novel prognostic biomarkers.
Paper Structure (15 sections, 6 figures, 2 tables)

This paper contains 15 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: In this study we design a biomarker proposal system based on contrastive learning. After self-supervised pretraining, we cluster images that shared similar sets of features. Finally, two teams of retinal specialists independently identify the characteristic features of each cluster that potentially relate to new biomarkers.
  • Figure 2: The fully automated backbone of our biomarker proposal system consists of two stages. Firstly, self-supervised contrastive learning trains models to identify biomarkers and other image features without any clinical annotations. To do so, it trains networks to ignore a specified set of known invariant image features, defined by the set of contrastive transformations, amplifying the signal of any biomarker related features. Secondly, we extract self-supervised image features and cluster images with similar features. After this we compute attribution maps that highlight the cluster-specific features in each image to assist interpretation by retinal specialists.
  • Figure 3: Each cluster with its description derived independently by two teams of retinal specialists. In each we show four representative images from different patients. Our proposal system identified these clusters without any human supervision or prior knowledge of known biomarkers through a process of self-discovery. Out of 30 clusters, 23 were related to AMD of which 16 made subtle distinctions between fine-grained biomarkers that were either unknown to retinal specialists or not included in existing clinical grading systems.
  • Figure 4: Images from ten randomly drawn patients from six clusters, as shown to retinal specialists during the cluster interpretation interviews. The clusters were were largely homogeneous and had identifiable features that described the majority of the images (written to the left). This was reinforced by cluster-specific attribution maps (below each image) that indicating that a consistent set of self-supervised features define each cluster.
  • Figure 5: Clusters were correlated with known biomarker annotations and disease stages using conditional probability. By comparing clusters that are indistinguishable to current grading systems but are, by definition, distinguishable by their self-supervised features we hope to identify new biomarker subtypes that are currently conflated in each indiscriminate disease stage.
  • ...and 1 more figures