Table of Contents
Fetching ...

A General Model for Retinal Segmentation and Quantification

Zhonghua Wang, Lie Ju, Sijia Li, Wei Feng, Sijin Zhou, Ming Hu, Jianhao Xiong, Xiaoying Tang, Yifan Peng, Mingquan Lin, Yaodong Ding, Yong Zeng, Wenbin Wei, Li Dong, Zongyuan Ge

TL;DR

RetSAM introduces a unified retinal segmentation-to-quantification framework trained on over 200,000 fundus images to deliver robust multi-target segmentation (anatomical structures, lesions, and phenotypes) and 30 standardized biomarkers. Built on a Swin Transformer backbone with task-decoupled decoders, it uses a three-stage training pipeline—task-specific experts, pseudo-labeling on large public datasets, and private-task adaptation—to achieve strong cross-dataset and cross-modality generalization. Across 17 public benchmarks and multi-task/multi-domain settings, RetSAM demonstrates superior or competitive segmentation performance, with average DSC gains of 3.9 percentage points and up to 15 points on challenging tasks, while enabling scalable oculomics analyses. The open-source toolkit provides reproducible segmentation-to-quantification and harmonized biomarkers for population-scale retinal research and clinical translation.

Abstract

Retinal imaging is fast, non-invasive, and widely available, offering quantifiable structural and vascular signals for ophthalmic and systemic health assessment. This accessibility creates an opportunity to study how quantitative retinal phenotypes relate to ocular and systemic diseases. However, such analyses remain difficult at scale due to the limited availability of public multi-label datasets and the lack of a unified segmentation-to-quantification pipeline. We present RetSAM, a general retinal segmentation and quantification framework for fundus imaging. It delivers robust multi-target segmentation and standardized biomarker extraction, supporting downstream ophthalmologic studies and oculomics correlation analyses. Trained on over 200,000 fundus images, RetSAM supports three task categories and segments five anatomical structures, four retinal phenotypic patterns, and more than 20 distinct lesion types. It converts these segmentation results into over 30 standardized biomarkers that capture structural morphology, vascular geometry, and degenerative changes. Trained with a multi-stage strategy using both private and public fundus data, RetSAM achieves superior segmentation performance on 17 public datasets. It improves on prior best methods by 3.9 percentage points in DSC on average, with up to 15 percentage points on challenging multi-task benchmarks, and generalizes well across diverse populations, imaging devices, and clinical settings. The resulting biomarkers enable systematic correlation analyses across major ophthalmic diseases, including diabetic retinopathy, age-related macular degeneration, glaucoma, and pathologic myopia. Together, RetSAM transforms fundus images into standardized, interpretable quantitative phenotypes, enabling large-scale ophthalmic research and translation.

A General Model for Retinal Segmentation and Quantification

TL;DR

RetSAM introduces a unified retinal segmentation-to-quantification framework trained on over 200,000 fundus images to deliver robust multi-target segmentation (anatomical structures, lesions, and phenotypes) and 30 standardized biomarkers. Built on a Swin Transformer backbone with task-decoupled decoders, it uses a three-stage training pipeline—task-specific experts, pseudo-labeling on large public datasets, and private-task adaptation—to achieve strong cross-dataset and cross-modality generalization. Across 17 public benchmarks and multi-task/multi-domain settings, RetSAM demonstrates superior or competitive segmentation performance, with average DSC gains of 3.9 percentage points and up to 15 points on challenging tasks, while enabling scalable oculomics analyses. The open-source toolkit provides reproducible segmentation-to-quantification and harmonized biomarkers for population-scale retinal research and clinical translation.

Abstract

Retinal imaging is fast, non-invasive, and widely available, offering quantifiable structural and vascular signals for ophthalmic and systemic health assessment. This accessibility creates an opportunity to study how quantitative retinal phenotypes relate to ocular and systemic diseases. However, such analyses remain difficult at scale due to the limited availability of public multi-label datasets and the lack of a unified segmentation-to-quantification pipeline. We present RetSAM, a general retinal segmentation and quantification framework for fundus imaging. It delivers robust multi-target segmentation and standardized biomarker extraction, supporting downstream ophthalmologic studies and oculomics correlation analyses. Trained on over 200,000 fundus images, RetSAM supports three task categories and segments five anatomical structures, four retinal phenotypic patterns, and more than 20 distinct lesion types. It converts these segmentation results into over 30 standardized biomarkers that capture structural morphology, vascular geometry, and degenerative changes. Trained with a multi-stage strategy using both private and public fundus data, RetSAM achieves superior segmentation performance on 17 public datasets. It improves on prior best methods by 3.9 percentage points in DSC on average, with up to 15 percentage points on challenging multi-task benchmarks, and generalizes well across diverse populations, imaging devices, and clinical settings. The resulting biomarkers enable systematic correlation analyses across major ophthalmic diseases, including diabetic retinopathy, age-related macular degeneration, glaucoma, and pathologic myopia. Together, RetSAM transforms fundus images into standardized, interpretable quantitative phenotypes, enabling large-scale ophthalmic research and translation.
Paper Structure (21 sections, 7 figures, 17 tables)

This paper contains 21 sections, 7 figures, 17 tables.

Figures (7)

  • Figure 1: Overview of the RetSAM workflow.(a) Data curation and clinical integration: RetSAM is trained and evaluated on a comprehensive cohort of public and private fundus datasets, utilizing a unified annotation protocol that combines labeled and unlabeled data. Over 50 ophthalmologists from four clinical institutions contributed to the project, participating in data curation, reader studies, and a randomized controlled trial. (b) Task-disease alignment: RetSAM’s segmentation tasks are mapped to clinically relevant ophthalmologic disease manifestations, bridging structural and lesion patterns with diagnostic categories used in routine practice. (c) Segmentation, quantification, and validation: From a single fundus image, RetSAM jointly segments retinal structures and lesion phenotypes to derive over 30 quantitative biomarkers. These biomarkers facilitate diagnosis, disease monitoring, and oculomic association analysis. The system underwent rigorous validation via a multinational reader study and a prospective randomized controlled trial, demonstrating enhanced diagnostic confidence and clinical decision-making efficiency.
  • Figure 2: Comparison of segmentation performance under limited supervision. We evaluate RetSAM against RetFound and SAM2-UNet on the FIVES and REFUGE datasets using different ratios of training data. The red dashed line indicates the specific data fraction required by RetSAM to match the peak performance attained by the best competing method using full data.
  • Figure 3: Qualitative segmentation comparison of RetSAM against RetFound, SAM3, and SAM2-UNet across four downstream tasks. The predicted segmentation masks are overlaid on the original images with distinct colors.
  • Figure 4: Qualitative demonstration of RetSAM's comprehensive segmentation capabilities. Segmentation predictions are displayed as overlays with distinct colors. The figure highlights the diverse range of categories the model can identify, spanning from anatomical structures to pathological lesions and fundus features.
  • Figure 5: Qualitative segmentation results for retinal lesions beyond common DR and AMD-related categories. Segmentation predictions are displayed as white overlays. These examples demonstrate the model's effective coverage of lesion classes which are frequently absent from general-purpose public benchmarks.
  • ...and 2 more figures