Table of Contents
Fetching ...

Extrapolating Prospective Glaucoma Fundus Images through Diffusion Model in Irregular Longitudinal Sequences

Zhihao Zhao, Junjie Yang, Shahrooz Faghihroohi, Yinzheng Zhao, Daniel Zapp, Kai Huang, Nassir Navab, M. Ali Nasseri

TL;DR

A novel diffusion-based model to predict prospective images by extrapolating from existing longitudinal fundus images of patients, which not only effectively generates longitudinal data but also significantly improves the precision of downstream classification tasks.

Abstract

The utilization of longitudinal datasets for glaucoma progression prediction offers a compelling approach to support early therapeutic interventions. Predominant methodologies in this domain have primarily focused on the direct prediction of glaucoma stage labels from longitudinal datasets. However, such methods may not adequately encapsulate the nuanced developmental trajectory of the disease. To enhance the diagnostic acumen of medical practitioners, we propose a novel diffusion-based model to predict prospective images by extrapolating from existing longitudinal fundus images of patients. The methodology delineated in this study distinctively leverages sequences of images as inputs. Subsequently, a time-aligned mask is employed to select a specific year for image generation. During the training phase, the time-aligned mask resolves the issue of irregular temporal intervals in longitudinal image sequence sampling. Additionally, we utilize a strategy of randomly masking a frame in the sequence to establish the ground truth. This methodology aids the network in continuously acquiring knowledge regarding the internal relationships among the sequences throughout the learning phase. Moreover, the introduction of textual labels is instrumental in categorizing images generated within the sequence. The empirical findings from the conducted experiments indicate that our proposed model not only effectively generates longitudinal data but also significantly improves the precision of downstream classification tasks.

Extrapolating Prospective Glaucoma Fundus Images through Diffusion Model in Irregular Longitudinal Sequences

TL;DR

A novel diffusion-based model to predict prospective images by extrapolating from existing longitudinal fundus images of patients, which not only effectively generates longitudinal data but also significantly improves the precision of downstream classification tasks.

Abstract

The utilization of longitudinal datasets for glaucoma progression prediction offers a compelling approach to support early therapeutic interventions. Predominant methodologies in this domain have primarily focused on the direct prediction of glaucoma stage labels from longitudinal datasets. However, such methods may not adequately encapsulate the nuanced developmental trajectory of the disease. To enhance the diagnostic acumen of medical practitioners, we propose a novel diffusion-based model to predict prospective images by extrapolating from existing longitudinal fundus images of patients. The methodology delineated in this study distinctively leverages sequences of images as inputs. Subsequently, a time-aligned mask is employed to select a specific year for image generation. During the training phase, the time-aligned mask resolves the issue of irregular temporal intervals in longitudinal image sequence sampling. Additionally, we utilize a strategy of randomly masking a frame in the sequence to establish the ground truth. This methodology aids the network in continuously acquiring knowledge regarding the internal relationships among the sequences throughout the learning phase. Moreover, the introduction of textual labels is instrumental in categorizing images generated within the sequence. The empirical findings from the conducted experiments indicate that our proposed model not only effectively generates longitudinal data but also significantly improves the precision of downstream classification tasks.

Paper Structure

This paper contains 19 sections, 3 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Overview of our proposed $DILS$, whose architecture is based on a 3D diffusion model. The network's input comprises a sequence of glaucoma images and a time-aligned mask $M_A$. The image sequence is encoded into the latent space $Z_0$. During the training phase, the time-aligned mask $M_A$ randomly selects a frame from the sequence to be hidden, resulting in a new mask $M$. $\hat{M}$ is obtained by expanding $M$ to the same dimensions as $Z_0$. To better control the attributes of the generated sequence frames, we provide a label mask$M_L$.
  • Figure 2: Temporal attention module for capturing information related to disease progression within the sequence.
  • Figure 3: Visualization of glaucoma progression prediction. The first row presents a sequence of images of the patient from various past time periods. The second row shows the ground truth (GT), which represents the current fundus image of the patient; the other images depict the prediction from different methods.
  • Figure 4: Results of prediction with and without label mask.