Detection of diabetic retinopathy using longitudinal self-supervised learning
Rachid Zeghlache, Pierre-Henri Conze, Mostafa El Habib Daho, Ramin Tadayoni, Pascal Massin, Béatrice Cochener, Gwenolé Quellec, Mathieu Lamard
TL;DR
This work probes longitudinal self-supervised learning to detect early diabetic retinopathy (DR) progression from pairs of consecutive color fundus photographs. It compares three LSSL-based pretext tasks—Longitudinal Siamese, LSSL with a trajectory direction constraint, and Longitudinal neighbourhood embedding—using a shared encoder to produce latent trajectory representations $\Delta z$ that capture disease dynamics. On the OPHDIAT dataset, LSSL variants, particularly Zhao 2021, achieve the best performance (AUC $\approx 0.962$) for early change detection from no/mild DR to more severe DR, significantly outperforming baselines and highlighting the latent space's capacity to encode progression. The results suggest that appropriately aligned longitudinal latent representations can meaningfully reflect DR progression and could enhance early screening and patient-specific management, though challenges remain in hyperparameter sensitivity and latent-space disentanglement. Overall, the study demonstrates the potential of longitudinal self-supervision to extract clinically relevant dynamic information from routinely collected CFPs.
Abstract
Longitudinal imaging is able to capture both static anatomical structures and dynamic changes in disease progression towards earlier and better patient-specific pathology management. However, conventional approaches for detecting diabetic retinopathy (DR) rarely take advantage of longitudinal information to improve DR analysis. In this work, we investigate the benefit of exploiting self-supervised learning with a longitudinal nature for DR diagnosis purposes. We compare different longitudinal self-supervised learning (LSSL) methods to model the disease progression from longitudinal retinal color fundus photographs (CFP) to detect early DR severity changes using a pair of consecutive exams. The experiments were conducted on a longitudinal DR screening dataset with or without those trained encoders (LSSL) acting as a longitudinal pretext task. Results achieve an AUC of 0.875 for the baseline (model trained from scratch) and an AUC of 0.96 (95% CI: 0.9593-0.9655 DeLong test) with a p-value < 2.2e-16 on early fusion using a simple ResNet alike architecture with frozen LSSL weights, suggesting that the LSSL latent space enables to encode the dynamic of DR progression.
