Table of Contents
Fetching ...

Harnessing the power of longitudinal medical imaging for eye disease prognosis using Transformer-based sequence modeling

Gregory Holste, Mingquan Lin, Ruiwen Zhou, Fei Wang, Lei Liu, Qi Yan, Sarah H. Van Tassel, Kyle Kovacs, Emily Y. Chew, Zhiyong Lu, Zhangyang Wang, Yifan Peng

TL;DR

The paper tackles predicting future risk of eye diseases by leveraging longitudinal fundus imaging through a Transformer-based survival framework, LTSA. LTSA encodes irregular temporal sequences with a temporal positional encoder, applies causal attention to model time-varying image relationships, and outputs discrete hazard distributions to derive eye-specific survival curves. The model optimizes a survival loss together with a step-ahead feature prediction loss, and is evaluated on AREDS and OHTS datasets, showing consistent improvements over a single-image baseline across multiple time horizons, with time-dependent concordance indices rising from ~$C(t,\Delta t)$ values near 0.88–0.91. Temporal attention analysis reveals most predictive information lies in the most recent visit while still valuing prior visits in a decaying fashion, supporting the claim that longitudinal modeling yields dynamic and explainable prognoses for AMD and POAG.

Abstract

Deep learning has enabled breakthroughs in automated diagnosis from medical imaging, with many successful applications in ophthalmology. However, standard medical image classification approaches only assess disease presence at the time of acquisition, neglecting the common clinical setting of longitudinal imaging. For slow, progressive eye diseases like age-related macular degeneration (AMD) and primary open-angle glaucoma (POAG), patients undergo repeated imaging over time to track disease progression and forecasting the future risk of developing disease is critical to properly plan treatment. Our proposed Longitudinal Transformer for Survival Analysis (LTSA) enables dynamic disease prognosis from longitudinal medical imaging, modeling the time to disease from sequences of fundus photography images captured over long, irregular time periods. Using longitudinal imaging data from the Age-Related Eye Disease Study (AREDS) and Ocular Hypertension Treatment Study (OHTS), LTSA significantly outperformed a single-image baseline in 19/20 head-to-head comparisons on late AMD prognosis and 18/20 comparisons on POAG prognosis. A temporal attention analysis also suggested that, while the most recent image is typically the most influential, prior imaging still provides additional prognostic value.

Harnessing the power of longitudinal medical imaging for eye disease prognosis using Transformer-based sequence modeling

TL;DR

The paper tackles predicting future risk of eye diseases by leveraging longitudinal fundus imaging through a Transformer-based survival framework, LTSA. LTSA encodes irregular temporal sequences with a temporal positional encoder, applies causal attention to model time-varying image relationships, and outputs discrete hazard distributions to derive eye-specific survival curves. The model optimizes a survival loss together with a step-ahead feature prediction loss, and is evaluated on AREDS and OHTS datasets, showing consistent improvements over a single-image baseline across multiple time horizons, with time-dependent concordance indices rising from ~ values near 0.88–0.91. Temporal attention analysis reveals most predictive information lies in the most recent visit while still valuing prior visits in a decaying fashion, supporting the claim that longitudinal modeling yields dynamic and explainable prognoses for AMD and POAG.

Abstract

Deep learning has enabled breakthroughs in automated diagnosis from medical imaging, with many successful applications in ophthalmology. However, standard medical image classification approaches only assess disease presence at the time of acquisition, neglecting the common clinical setting of longitudinal imaging. For slow, progressive eye diseases like age-related macular degeneration (AMD) and primary open-angle glaucoma (POAG), patients undergo repeated imaging over time to track disease progression and forecasting the future risk of developing disease is critical to properly plan treatment. Our proposed Longitudinal Transformer for Survival Analysis (LTSA) enables dynamic disease prognosis from longitudinal medical imaging, modeling the time to disease from sequences of fundus photography images captured over long, irregular time periods. Using longitudinal imaging data from the Age-Related Eye Disease Study (AREDS) and Ocular Hypertension Treatment Study (OHTS), LTSA significantly outperformed a single-image baseline in 19/20 head-to-head comparisons on late AMD prognosis and 18/20 comparisons on POAG prognosis. A temporal attention analysis also suggested that, while the most recent image is typically the most influential, prior imaging still provides additional prognostic value.
Paper Structure (16 sections, 18 equations, 9 figures, 3 tables)

This paper contains 16 sections, 18 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Stages of AMD progression. Color fundus photography images illustrating the various stages of AMD, a progressive eye disease affecting the macula. Images come from the AREDS dataset accompanied by a 9-step AMD severity score; a score over 9 indicates late-stage AMD, which can cause blurring and loss of central vision. Green boxes highlight "drusen", yellowish deposits of protein under the retina, which can be an early sign of AMD. There are two forms of late AMD: "dry", or atrophic, AMD (also called geographic atrophy) and "wet", or neovascular, AMD. AMD = age-related macular degeneration; AREDS = Age-Related Eye Disease Study.
  • Figure 1: Auxiliary late AMD prognosis results. Time-dependent Brier score $B(t,\ {\Delta}t)$ for various values of prediction time t and evaluation time $\Delta t$ comparing the single-image baseline model (blue) to LTSA, which incorporates all prior visits (orange). Box plots depict the values computed from 1,000 bootstrap samples of the test set (center line = median, box = IQR, whiskers = 1.5x the IQR from the box). Significance levels are determined from Bonferroni-adjusted $P$-values as follows: **** = $P \leq 0.0001$, *** = $P \leq 0.001$, ** = $P \leq 0.01$, * = $P \leq 0.05$, ns = no significant difference. AMD = age-related macular degeneration; IQR = interquartile range.
  • Figure 2: Overview of proposed longitudinal survival analysis approach. In longitudinal medical imaging, patients undergo repeated imaging over long periods of time at irregular intervals (a). Rather than predict the presence of disease at the time of imaging, our method leverages a patient's longitudinal imaging history to forecast the future risk of developing disease through a survival analysis framework (b). Our approach represents the collection of fundus images for an eye over time as a sequence fit for modeling with Transformers. To accommodate large, irregular intervals between consecutive visits, a temporal positional encoder fuses this information with the image embeddings from each visit. A Transformer encoder then employs causal temporal attention over the sequence, only attending to prior visits. The entire model is optimized end-to-end to predict the time-varying hazard function for each unique sequence of consecutive visits. From the hazard function, we compute eye-specific survival curves, allowing for dynamic eye disease risk prognosis evaluated through the framework of longitudinal survival analysis (c).
  • Figure 2: Auxiliary POAG prognosis results. Time-dependent Brier score $B(t,\ \mathrm{\Delta}t)$ for various values of prediction time t and evaluation time $\Delta t$ comparing the single-image baseline model (blue) to LTSA, which incorporates all prior visits (orange). Box plots depict the values computed from 1,000 bootstrap samples of the test set (center line = median, box = IQR, whiskers = 1.5x the IQR from the box). Significance levels are determined from Bonferroni-adjusted $P$-values as follows: **** = $P \leq 0.0001$, *** = $P \leq 0.001$, ** = $P \leq 0.01$, * = $P \leq 0.05$, ns = no significant difference. IQR = interquartile range; POAG = primary open-angle glaucoma.
  • Figure 3: Late AMD prognosis results. Time-dependent concordance index $C(t,{\Delta}t)$ for various values of prediction time $t$ and evaluation time $\Delta t$ comparing the single-image baseline model (blue) to LTSA, which incorporates all prior visits (orange). Box plots depict the values computed from 1,000 bootstrap samples of the test set (center line = median, box = IQR, whiskers = 1.5x the IQR from the box). Significance levels are determined from Bonferroni-adjusted $P$-values as follows: **** = $P \leq 0.0001$, *** = $P \leq 0.001$, ** = $P \leq 0.01$, * = $P \leq 0.05$, ns = no significant difference. AMD = age-related macular degeneration; IQR = interquartile range.
  • ...and 4 more figures