Table of Contents
Fetching ...

PSScreen: Partially Supervised Multiple Retinal Disease Screening

Boyi Zheng, Qing Liu

TL;DR

PSScreen tackles the challenge of screening multiple retinal diseases with partially labeled data across diverse domains by employing a two-stream architecture that learns both deterministic and uncertainty-driven probabilistic features. Text-guided semantic decoupling aligns disease-specific visual cues with expert textual knowledge, while feature distillation and self-distillation ensure cross-stream and task-relevant semantics are preserved. Pseudo label consistency addresses missing-label issues, and a staged loss schedule facilitates robust training, yielding state-of-the-art performance on in-domain and strong generalization to unseen and ODIR datasets. The approach demonstrates efficient inference and strong lesion localization, suggesting practical benefits for large-scale, cross-site retinal disease screening. Future work includes expanding to additional imaging modalities and integrating clinician-in-the-loop evaluations to further validate interpretability and clinical utility.

Abstract

Leveraging multiple partially labeled datasets to train a model for multiple retinal disease screening reduces the reliance on fully annotated datasets, but remains challenging due to significant domain shifts across training datasets from various medical sites, and the label absent issue for partial classes. To solve these challenges, we propose PSScreen, a novel Partially Supervised multiple retinal disease Screening model. Our PSScreen consists of two streams and one learns deterministic features and the other learns probabilistic features via uncertainty injection. Then, we leverage the textual guidance to decouple two types of features into disease-wise features and align them via feature distillation to boost the domain generalization ability. Meanwhile, we employ pseudo label consistency between two streams to address the label absent issue and introduce a self-distillation to transfer task-relevant semantics about known classes from the deterministic to the probabilistic stream to further enhance the detection performances. Experiments show that our PSScreen significantly enhances the detection performances on six retinal diseases and the normal state averagely and achieves state-of-the-art results on both in-domain and out-of-domain datasets. Codes are available at https://github.com/boyiZheng99/PSScreen.

PSScreen: Partially Supervised Multiple Retinal Disease Screening

TL;DR

PSScreen tackles the challenge of screening multiple retinal diseases with partially labeled data across diverse domains by employing a two-stream architecture that learns both deterministic and uncertainty-driven probabilistic features. Text-guided semantic decoupling aligns disease-specific visual cues with expert textual knowledge, while feature distillation and self-distillation ensure cross-stream and task-relevant semantics are preserved. Pseudo label consistency addresses missing-label issues, and a staged loss schedule facilitates robust training, yielding state-of-the-art performance on in-domain and strong generalization to unseen and ODIR datasets. The approach demonstrates efficient inference and strong lesion localization, suggesting practical benefits for large-scale, cross-site retinal disease screening. Future work includes expanding to additional imaging modalities and integrating clinician-in-the-loop evaluations to further validate interpretability and clinical utility.

Abstract

Leveraging multiple partially labeled datasets to train a model for multiple retinal disease screening reduces the reliance on fully annotated datasets, but remains challenging due to significant domain shifts across training datasets from various medical sites, and the label absent issue for partial classes. To solve these challenges, we propose PSScreen, a novel Partially Supervised multiple retinal disease Screening model. Our PSScreen consists of two streams and one learns deterministic features and the other learns probabilistic features via uncertainty injection. Then, we leverage the textual guidance to decouple two types of features into disease-wise features and align them via feature distillation to boost the domain generalization ability. Meanwhile, we employ pseudo label consistency between two streams to address the label absent issue and introduce a self-distillation to transfer task-relevant semantics about known classes from the deterministic to the probabilistic stream to further enhance the detection performances. Experiments show that our PSScreen significantly enhances the detection performances on six retinal diseases and the normal state averagely and achieves state-of-the-art results on both in-domain and out-of-domain datasets. Codes are available at https://github.com/boyiZheng99/PSScreen.

Paper Structure

This paper contains 25 sections, 16 equations, 6 figures, 15 tables.

Figures (6)

  • Figure 1: Exampled open-access datasets for retinal disease screening and screening model comparisons under three learning paradigms. (a) lists open-access datasets where "✓" indicates labels for diseases are available while "?" denotes labels are not available. From (b) to (d), we illustrate the pipeline and characteristics of the fully supervised screening model usually trained with a fully labeled dataset, the self-supervised screening model trained with multiple datasets consisting of image-text pairs, and the partially supervised screening model trained with multiple partially labeled datasets.
  • Figure 2: The framework of PSScreen. (a) illustrates the training pipeline of PSScreen. With training images, deterministic features and probabilistic features are extracted by the encoding blocks and domain shifts with uncertainty (DSU) blocks, then decoupled by the text-guided semantic decoupling module, finally fed to the disease classifier for multi-disease risk prediction. Feature distillation $\mathcal{L}_{f\text{-}dist}$, self-distillation for known classes $\mathcal{L}_{s\text{-}dist}^{known}$, pseudo label consistency for unknown classes $\mathcal{L}_{con}^{unknown}$, and cross entropy loss for known classes $\mathcal{L}_{CE}^{known}$ are applied for model optimization. (b) and (c) illustrate details for DSU and text-guided semantic decoupling block.
  • Figure 3: (a) Performance comparison of zero-shot inference with foundation models on the ODIR200x3 dataset. (b) Visualization of heatmaps generated by MultiHeads and PSScreen for three retinal diseases: age-related macular degeneration (AMD), glaucoma, and pathologic myopia (PM).
  • Figure 4: Exampled open-access datasets for retinal disease screening and screening model comparisons under three learning paradigms. (a) lists open-access datasets where "✓" indicates labels for diseases are available while "?" denotes labels are not available. From (b) to (d), we illustrate the pipeline and characteristics of the fully supervised screening model usually trained with a fully labeled dataset, the self-supervised screening model trained with multiple datasets consisting of image-text pairs, and the partially supervised screening model trained with multiple partially labeled datasets.
  • Figure 5: The framework of PSScreen. (a) illustrates the training pipeline of PSScreen. With training images, deterministic features and probabilistic features are extracted by the encoding blocks and domain shifts with uncertainty (DSU) blocks, then decoupled by the text-guided semantic decoupling module, finally fed to the disease classifier for multi-disease risk prediction. Feature distillation $\mathcal{L}_{f\text{-}dist}$, self-distillation for known classes $\mathcal{L}_{s\text{-}dist}^{known}$, pseudo label consistency for unknown classes $\mathcal{L}_{con}^{unknown}$, and cross entropy loss for known classes $\mathcal{L}_{CE}^{known}$ are applied for model optimization. (b) and (c) illustrate details for DSU and text-guided semantic decoupling block.
  • ...and 1 more figures