Table of Contents
Fetching ...

DISCOVER: 2-D Multiview Summarization of Optical Coherence Tomography Angiography for Automatic Diabetic Retinopathy Diagnosis

Mostafa El Habib Daho, Yihao Li, Rachid Zeghlache, Hugo Le Boité, Pierre Deman, Laurent Borderie, Hugang Ren, Niranchana Mannivanan, Capucine Lepicard, Béatrice Cochener, Aude Couturier, Ramin Tadayoni, Pierre-Henri Conze, Mathieu Lamard, Gwenolé Quellec

TL;DR

This work tackles automatic DR severity assessment from 3-D OCTA data by introducing DISCOVER, a framework that first summarizes volumetric OCTA into a trainable 2-D en-face projection and then performs DR classification with an interpretable, attribution-guided refinement via selected B-scans. The method combines a 3-D→2-D projection network with two 2-D classifier ensembles, using model dropout and differentiable data augmentation to maximize generalization, and employs attribution methods to identify salient B-scans for secondary classification. Empirical results on the EviRed OCTA dataset show that DISCOVER matches or surpasses 3-D baselines while being faster and more interpretable, with notable gains for detecting mild NPDR and PDR. The approach demonstrates strong potential for clinical decision support and interpretability in 3-D medical imaging contexts, and lays groundwork for future longitudinal and Transformer-based extensions.

Abstract

Diabetic Retinopathy (DR), an ocular complication of diabetes, is a leading cause of blindness worldwide. Traditionally, DR is monitored using Color Fundus Photography (CFP), a widespread 2-D imaging modality. However, DR classifications based on CFP have poor predictive power, resulting in suboptimal DR management. Optical Coherence Tomography Angiography (OCTA) is a recent 3-D imaging modality offering enhanced structural and functional information (blood flow) with a wider field of view. This paper investigates automatic DR severity assessment using 3-D OCTA. A straightforward solution to this task is a 3-D neural network classifier. However, 3-D architectures have numerous parameters and typically require many training samples. A lighter solution consists in using 2-D neural network classifiers processing 2-D en-face (or frontal) projections and/or 2-D cross-sectional slices. Such an approach mimics the way ophthalmologists analyze OCTA acquisitions: 1) en-face flow maps are often used to detect avascular zones and neovascularization, and 2) cross-sectional slices are commonly analyzed to detect macular edemas, for instance. However, arbitrary data reduction or selection might result in information loss. Two complementary strategies are thus proposed to optimally summarize OCTA volumes with 2-D images: 1) a parametric en-face projection optimized through deep learning and 2) a cross-sectional slice selection process controlled through gradient-based attribution. The full summarization and DR classification pipeline is trained from end to end. The automatic 2-D summary can be displayed in a viewer or printed in a report to support the decision. We show that the proposed 2-D summarization and classification pipeline outperforms direct 3-D classification with the advantage of improved interpretability.

DISCOVER: 2-D Multiview Summarization of Optical Coherence Tomography Angiography for Automatic Diabetic Retinopathy Diagnosis

TL;DR

This work tackles automatic DR severity assessment from 3-D OCTA data by introducing DISCOVER, a framework that first summarizes volumetric OCTA into a trainable 2-D en-face projection and then performs DR classification with an interpretable, attribution-guided refinement via selected B-scans. The method combines a 3-D→2-D projection network with two 2-D classifier ensembles, using model dropout and differentiable data augmentation to maximize generalization, and employs attribution methods to identify salient B-scans for secondary classification. Empirical results on the EviRed OCTA dataset show that DISCOVER matches or surpasses 3-D baselines while being faster and more interpretable, with notable gains for detecting mild NPDR and PDR. The approach demonstrates strong potential for clinical decision support and interpretability in 3-D medical imaging contexts, and lays groundwork for future longitudinal and Transformer-based extensions.

Abstract

Diabetic Retinopathy (DR), an ocular complication of diabetes, is a leading cause of blindness worldwide. Traditionally, DR is monitored using Color Fundus Photography (CFP), a widespread 2-D imaging modality. However, DR classifications based on CFP have poor predictive power, resulting in suboptimal DR management. Optical Coherence Tomography Angiography (OCTA) is a recent 3-D imaging modality offering enhanced structural and functional information (blood flow) with a wider field of view. This paper investigates automatic DR severity assessment using 3-D OCTA. A straightforward solution to this task is a 3-D neural network classifier. However, 3-D architectures have numerous parameters and typically require many training samples. A lighter solution consists in using 2-D neural network classifiers processing 2-D en-face (or frontal) projections and/or 2-D cross-sectional slices. Such an approach mimics the way ophthalmologists analyze OCTA acquisitions: 1) en-face flow maps are often used to detect avascular zones and neovascularization, and 2) cross-sectional slices are commonly analyzed to detect macular edemas, for instance. However, arbitrary data reduction or selection might result in information loss. Two complementary strategies are thus proposed to optimally summarize OCTA volumes with 2-D images: 1) a parametric en-face projection optimized through deep learning and 2) a cross-sectional slice selection process controlled through gradient-based attribution. The full summarization and DR classification pipeline is trained from end to end. The automatic 2-D summary can be displayed in a viewer or printed in a report to support the decision. We show that the proposed 2-D summarization and classification pipeline outperforms direct 3-D classification with the advantage of improved interpretability.
Paper Structure (30 sections, 10 equations, 10 figures, 6 tables)

This paper contains 30 sections, 10 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Geometry of an Optical Coherence Tomography (OCT) acquisition. A 3-D B-scan consists of multiple 2-D B-scans, which in turn consist of multiple 1-D A-scans.
  • Figure 2: Overview of the proposed approach. A multi-channel 3-D volume is summarized as a 2-D image through a 3-D$\rightarrow$2-D projection network, detailed in Fig. \ref{['fig:projector']}. Next, a first classification branch classifies this 2-D summary image in order to produce a DR diagnosis. Through an attribution method, the most relevant 2-D B-scans are selected. Then, a second classification branch classifies the selected B-scans to improve the DR diagnosis. Each classification branch is an ensemble of classifiers, detailed in Fig. \ref{['fig:ensemble']}. In this figure, each 3-D channel in the input volume is represented by its Maximum Intensity Projection (MIP).
  • Figure 3: Preprocessing pipeline for OCTA acquisitions (see section \ref{['sec:preprocessing']}) illustrated on one B-scan. Each original 2-D flow and structure B-scan is flattened, masked out, and cropped. The original 2-D LSO en-face localizer is transformed into a 3-D volume by duplicating pixel intensities along the depth axis (within the masked region). A 3-channel 3-D volume is obtained by stacking the resulting three volumes (flow, structure, LSO).
  • Figure 4: Architecture of the 3-D $\rightarrow$ 2-D projection network, detailed in section \ref{['sec:projection']}. The input is the preprocessed acquisition, a 3-channel volume of $X \times Y_1 \times Z$ voxels, with $Y_1 = 224$ in this example. Parameter $\Phi$, the number of filters in the first block, controls the complexity of the projection network. The figure on the right illustrates the size of the data tensors at the output of each block.
  • Figure 5: Ensemble of classification networks $\left\lbrace \boldsymbol{\gamma}_k, k = 1..K\right\rbrace$ with model dropout (controlled by random parameters $\delta_k, k = 1..K$) and differentiable random transformations (affine transformations and horizontal flips, controlled by random parameters $\varepsilon_k, k = 1..K$). This pipeline, detailed in section \ref{['sec:classification_projection']}, is illustrated for the first classification branch of Fig. \ref{['fig:overview']}, in which the input images are 2-D summary images.
  • ...and 5 more figures