Table of Contents
Fetching ...

ODFormer: Semantic Fundus Image Segmentation Using Transformer for Optic Nerve Head Detection

Jiayi Wang, Yi-An Mao, Xiaoyu Ma, Sicen Guo, Yuting Shao, Xiao Lv, Wenting Han, Mark Christopher, Linda M. Zangwill, Yanlong Bi, Rui Fan

TL;DR

The paper addresses the challenge of robust ONH detection across fundus cameras by introducing ODFormer, a Swin Transformer–based semantic segmentation network enhanced with a Multi-Scale Context Aggregator and a Lightweight Bidirectional Feature Recalibrator. It also publishes TongjiU-DROD, a large multi-camera dataset with dual-camera fundus images and ground-truth ONH annotations, plus an ONH detection benchmark across DRIONS-DB, DRISHTI-GS1, and TongjiU-DROD. Experimental results show ODFormer achieving state-of-the-art performance and enhanced generalizability compared to SoTA CNNs and Transformers. The work provides valuable resources and methodologies to advance cross-camera ONH detection and broader semantic segmentation in medical imaging.

Abstract

Optic nerve head (ONH) detection has been a crucial area of study in ophthalmology for years. However, the significant discrepancy between fundus image datasets, each generated using a single type of fundus camera, poses challenges to the generalizability of ONH detection approaches developed based on semantic segmentation networks. Despite the numerous recent advancements in general-purpose semantic segmentation methods using convolutional neural networks (CNNs) and Transformers, there is currently a lack of benchmarks for these state-of-the-art (SoTA) networks specifically trained for ONH detection. Therefore, in this article, we make contributions from three key aspects: network design, the publication of a dataset, and the establishment of a comprehensive benchmark. Our newly developed ONH detection network, referred to as ODFormer, is based upon the Swin Transformer architecture and incorporates two novel components: a multi-scale context aggregator and a lightweight bidirectional feature recalibrator. Our published large-scale dataset, known as TongjiU-DROD, provides multi-resolution fundus images for each participant, captured using two distinct types of cameras. Our established benchmark involves three datasets: DRIONS-DB, DRISHTI-GS1, and TongjiU-DROD, created by researchers from different countries and containing fundus images captured from participants of diverse races and ages. Extensive experimental results demonstrate that our proposed ODFormer outperforms other state-of-the-art (SoTA) networks in terms of performance and generalizability. Our dataset and source code are publicly available at mias.group/ODFormer.

ODFormer: Semantic Fundus Image Segmentation Using Transformer for Optic Nerve Head Detection

TL;DR

The paper addresses the challenge of robust ONH detection across fundus cameras by introducing ODFormer, a Swin Transformer–based semantic segmentation network enhanced with a Multi-Scale Context Aggregator and a Lightweight Bidirectional Feature Recalibrator. It also publishes TongjiU-DROD, a large multi-camera dataset with dual-camera fundus images and ground-truth ONH annotations, plus an ONH detection benchmark across DRIONS-DB, DRISHTI-GS1, and TongjiU-DROD. Experimental results show ODFormer achieving state-of-the-art performance and enhanced generalizability compared to SoTA CNNs and Transformers. The work provides valuable resources and methodologies to advance cross-camera ONH detection and broader semantic segmentation in medical imaging.

Abstract

Optic nerve head (ONH) detection has been a crucial area of study in ophthalmology for years. However, the significant discrepancy between fundus image datasets, each generated using a single type of fundus camera, poses challenges to the generalizability of ONH detection approaches developed based on semantic segmentation networks. Despite the numerous recent advancements in general-purpose semantic segmentation methods using convolutional neural networks (CNNs) and Transformers, there is currently a lack of benchmarks for these state-of-the-art (SoTA) networks specifically trained for ONH detection. Therefore, in this article, we make contributions from three key aspects: network design, the publication of a dataset, and the establishment of a comprehensive benchmark. Our newly developed ONH detection network, referred to as ODFormer, is based upon the Swin Transformer architecture and incorporates two novel components: a multi-scale context aggregator and a lightweight bidirectional feature recalibrator. Our published large-scale dataset, known as TongjiU-DROD, provides multi-resolution fundus images for each participant, captured using two distinct types of cameras. Our established benchmark involves three datasets: DRIONS-DB, DRISHTI-GS1, and TongjiU-DROD, created by researchers from different countries and containing fundus images captured from participants of diverse races and ages. Extensive experimental results demonstrate that our proposed ODFormer outperforms other state-of-the-art (SoTA) networks in terms of performance and generalizability. Our dataset and source code are publicly available at mias.group/ODFormer.
Paper Structure (24 sections, 10 equations, 5 figures, 4 tables)

This paper contains 24 sections, 10 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: An overview of our proposed ODFormer.
  • Figure 2: The architecture of multi-scale context aggregator.
  • Figure 3: The architecture of lightweight bidirectional feature recalibration.
  • Figure 4: Examples of fundus images from a single participant captured using two different cameras: (a)-(d) are the fundus images captured using a Zeiss CLARUS 500 fundus camera and their ground-truth annotations; (e)-(h) are the fundus images captured using an NES-1000P handheld mydriasis-free portable fundus camera and their ground-truth annotations.
  • Figure 5: Qualitative experimental results achieved by ONH detection networks trained on the TongjiU-DROD dataset.