Harnessing the power of longitudinal medical imaging for eye disease prognosis using Transformer-based sequence modeling
Gregory Holste, Mingquan Lin, Ruiwen Zhou, Fei Wang, Lei Liu, Qi Yan, Sarah H. Van Tassel, Kyle Kovacs, Emily Y. Chew, Zhiyong Lu, Zhangyang Wang, Yifan Peng
TL;DR
The paper tackles predicting future risk of eye diseases by leveraging longitudinal fundus imaging through a Transformer-based survival framework, LTSA. LTSA encodes irregular temporal sequences with a temporal positional encoder, applies causal attention to model time-varying image relationships, and outputs discrete hazard distributions to derive eye-specific survival curves. The model optimizes a survival loss together with a step-ahead feature prediction loss, and is evaluated on AREDS and OHTS datasets, showing consistent improvements over a single-image baseline across multiple time horizons, with time-dependent concordance indices rising from ~$C(t,\Delta t)$ values near 0.88–0.91. Temporal attention analysis reveals most predictive information lies in the most recent visit while still valuing prior visits in a decaying fashion, supporting the claim that longitudinal modeling yields dynamic and explainable prognoses for AMD and POAG.
Abstract
Deep learning has enabled breakthroughs in automated diagnosis from medical imaging, with many successful applications in ophthalmology. However, standard medical image classification approaches only assess disease presence at the time of acquisition, neglecting the common clinical setting of longitudinal imaging. For slow, progressive eye diseases like age-related macular degeneration (AMD) and primary open-angle glaucoma (POAG), patients undergo repeated imaging over time to track disease progression and forecasting the future risk of developing disease is critical to properly plan treatment. Our proposed Longitudinal Transformer for Survival Analysis (LTSA) enables dynamic disease prognosis from longitudinal medical imaging, modeling the time to disease from sequences of fundus photography images captured over long, irregular time periods. Using longitudinal imaging data from the Age-Related Eye Disease Study (AREDS) and Ocular Hypertension Treatment Study (OHTS), LTSA significantly outperformed a single-image baseline in 19/20 head-to-head comparisons on late AMD prognosis and 18/20 comparisons on POAG prognosis. A temporal attention analysis also suggested that, while the most recent image is typically the most influential, prior imaging still provides additional prognostic value.
