Table of Contents
Fetching ...

Incomplete Modality Disentangled Representation for Ophthalmic Disease Grading and Diagnosis

Chengzhi Liu, Zile Huang, Zhe Chen, Feilong Tang, Yu Tian, Zhongxing Xu, Zihong Luo, Yalin Zheng, Yanda Meng

TL;DR

This work tackles the challenge of incomplete multimodal data in ophthalmic disease diagnosis by identifying two main limitations of prior methods: implicit representation constraints and modality heterogeneity. It introduces Incomplete Modality Disentangled Representation (IMDR), which explicitly disentangles features into modal-shared and modal-specific components via a Disentangle Extraction layer guided by a joint distribution, and uses mutual information to preserve modality-specific information. A Joint Proxy Learning (JPL) module further reduces intra-modality redundancy by aligning features with class-specific proxies, enabling robust distillation from a teacher model trained on complete data to a student model handling incomplete inputs. The approach yields state-of-the-art results on four ophthalmology multimodal datasets, showing improved accuracy and specificity under both inter- and intra-modality incompleteness, and is supported by qualitative attention visualizations. Overall, IMDR provides a principled, scalable framework for robust multimodal ophthalmic diagnosis in realistic settings with missing data.

Abstract

Ophthalmologists typically require multimodal data sources to improve diagnostic accuracy in clinical decisions. However, due to medical device shortages, low-quality data and data privacy concerns, missing data modalities are common in real-world scenarios. Existing deep learning methods tend to address it by learning an implicit latent subspace representation for different modality combinations. We identify two significant limitations of these methods: (1) implicit representation constraints that hinder the model's ability to capture modality-specific information and (2) modality heterogeneity, causing distribution gaps and redundancy in feature representations. To address these, we propose an Incomplete Modality Disentangled Representation (IMDR) strategy, which disentangles features into explicit independent modal-common and modal-specific features by guidance of mutual information, distilling informative knowledge and enabling it to reconstruct valuable missing semantics and produce robust multimodal representations. Furthermore, we introduce a joint proxy learning module that assists IMDR in eliminating intra-modality redundancy by exploiting the extracted proxies from each class. Experiments on four ophthalmology multimodal datasets demonstrate that the proposed IMDR outperforms the state-of-the-art methods significantly.

Incomplete Modality Disentangled Representation for Ophthalmic Disease Grading and Diagnosis

TL;DR

This work tackles the challenge of incomplete multimodal data in ophthalmic disease diagnosis by identifying two main limitations of prior methods: implicit representation constraints and modality heterogeneity. It introduces Incomplete Modality Disentangled Representation (IMDR), which explicitly disentangles features into modal-shared and modal-specific components via a Disentangle Extraction layer guided by a joint distribution, and uses mutual information to preserve modality-specific information. A Joint Proxy Learning (JPL) module further reduces intra-modality redundancy by aligning features with class-specific proxies, enabling robust distillation from a teacher model trained on complete data to a student model handling incomplete inputs. The approach yields state-of-the-art results on four ophthalmology multimodal datasets, showing improved accuracy and specificity under both inter- and intra-modality incompleteness, and is supported by qualitative attention visualizations. Overall, IMDR provides a principled, scalable framework for robust multimodal ophthalmic diagnosis in realistic settings with missing data.

Abstract

Ophthalmologists typically require multimodal data sources to improve diagnostic accuracy in clinical decisions. However, due to medical device shortages, low-quality data and data privacy concerns, missing data modalities are common in real-world scenarios. Existing deep learning methods tend to address it by learning an implicit latent subspace representation for different modality combinations. We identify two significant limitations of these methods: (1) implicit representation constraints that hinder the model's ability to capture modality-specific information and (2) modality heterogeneity, causing distribution gaps and redundancy in feature representations. To address these, we propose an Incomplete Modality Disentangled Representation (IMDR) strategy, which disentangles features into explicit independent modal-common and modal-specific features by guidance of mutual information, distilling informative knowledge and enabling it to reconstruct valuable missing semantics and produce robust multimodal representations. Furthermore, we introduce a joint proxy learning module that assists IMDR in eliminating intra-modality redundancy by exploiting the extracted proxies from each class. Experiments on four ophthalmology multimodal datasets demonstrate that the proposed IMDR outperforms the state-of-the-art methods significantly.

Paper Structure

This paper contains 21 sections, 16 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: (a) Vanilla latent subspace methods. (b) Our proposed IMDR strategy effectively decouples multimodal data by employing explicit constraints to minimize mutual information in a Disentangle Extraction layer, guided by a joint distribution. (c) Illustration of intra-modality and inter-modality inter-channel distances between encoder feature maps. “Single” is the model that trains the encoders of each modality independently, providing the ideal feature diversity without inter-modality interference. “:A” denotes the histogram mean. Lower inter-channel similarity means higher diversity. More details in the Appendix B.
  • Figure 2: Overview of our proposed framework. (a): We train a teacher model using complete modality data, followed by co-training with a student model on incomplete inputs for knowledge distillation. The distillation is supervised by feature loss $\mathcal{L}_{\text{Feat}}$ and logit loss $\mathcal{L}_{\text{Logit}}$. During the training of the teacher model, the encoder outputs the single-modality feature $e^f$ and $e^O$. We build a set of proxies for a modality, with each set representing a class. Positive proxies are selected by a similarity matrix between $\hat{e}$ and $e$. All proxies are optimized through the proxy loss $\mathcal{L}_{\text{Prox}}$. Consequently, $\hat{e}^{f,+}$ and $\hat{e}^{O,+}$, together with features $e^f$ and $e^O$ are then passed to the IMDR. (b): Details for IMDR strategey. We estimate the distributions of $\hat{e}^{f,+}$ and $\hat{e}^{O,+}$, then combine them using Eq. \ref{['eq: poe']} to obtain the joint distribution $\mathcal{P}(\hat{e} \vert x^{f}, x^{O})$. The modality-shared feature $s$ is sampled from this distribution. This feature $s$ guides the decoupling via an attention layer, supervised by the loss $\mathcal{L}_{\text{MI}}$ to minimize the mutual information between extracted shared features $\hat{s}$ and specific features $(\mathcal{R}^f, \mathcal{R}^O)$, as well as between $\mathcal{R}^f$ and $\mathcal{R}^O$.
  • Figure 3: Comparative visualization of attention maps under the inter-modality incompleteness setting: The first row is AMD dataset and the second row is Glaucoma dataset.
  • Figure 4: The comparison of performance across various missing rates under intra-modality incompleteness.
  • Figure 5: Ablation Study of visualization under condition of missing OCT modality on Harvard-30k AMD test set.
  • ...and 1 more figures