Table of Contents
Fetching ...

Light-weight Retinal Layer Segmentation with Global Reasoning

Xiang He, Weiye Song, Yiming Wang, Fabio Poiesi, Ji Yi, Manishi Desai, Quanqing Xu, Kongzheng Yang, Yi Wan

TL;DR

LightReSeg introduces a lightweight encoder-decoder architecture for retinal layer segmentation that integrates a Transformer-based global reasoning block at the deepest encoder scale with a multi-scale asymmetric attention (MAA) module for robust skip-feature fusion. The design employs depthwise separable and asymmetric convolutions to maintain a small parameter footprint while preserving segmentation accuracy. Across Vis-105H, Glaucoma, and DME datasets, LightReSeg achieves state-of-the-art performance in mIoU and mPA with only 3.3M parameters, and ablation studies confirm substantial gains from the MAA and Transformer components. The work emphasizes practical deployment in clinical OCT devices, highlighting both high accuracy and real-time inference potential, with plans to broaden datasets and improve domain generalization.

Abstract

Automatic retinal layer segmentation with medical images, such as optical coherence tomography (OCT) images, serves as an important tool for diagnosing ophthalmic diseases. However, it is challenging to achieve accurate segmentation due to low contrast and blood flow noises presented in the images. In addition, the algorithm should be light-weight to be deployed for practical clinical applications. Therefore, it is desired to design a light-weight network with high performance for retinal layer segmentation. In this paper, we propose LightReSeg for retinal layer segmentation which can be applied to OCT images. Specifically, our approach follows an encoder-decoder structure, where the encoder part employs multi-scale feature extraction and a Transformer block for fully exploiting the semantic information of feature maps at all scales and making the features have better global reasoning capabilities, while the decoder part, we design a multi-scale asymmetric attention (MAA) module for preserving the semantic information at each encoder scale. The experiments show that our approach achieves a better segmentation performance compared to the current state-of-the-art method TransUnet with 105.7M parameters on both our collected dataset and two other public datasets, with only 3.3M parameters.

Light-weight Retinal Layer Segmentation with Global Reasoning

TL;DR

LightReSeg introduces a lightweight encoder-decoder architecture for retinal layer segmentation that integrates a Transformer-based global reasoning block at the deepest encoder scale with a multi-scale asymmetric attention (MAA) module for robust skip-feature fusion. The design employs depthwise separable and asymmetric convolutions to maintain a small parameter footprint while preserving segmentation accuracy. Across Vis-105H, Glaucoma, and DME datasets, LightReSeg achieves state-of-the-art performance in mIoU and mPA with only 3.3M parameters, and ablation studies confirm substantial gains from the MAA and Transformer components. The work emphasizes practical deployment in clinical OCT devices, highlighting both high accuracy and real-time inference potential, with plans to broaden datasets and improve domain generalization.

Abstract

Automatic retinal layer segmentation with medical images, such as optical coherence tomography (OCT) images, serves as an important tool for diagnosing ophthalmic diseases. However, it is challenging to achieve accurate segmentation due to low contrast and blood flow noises presented in the images. In addition, the algorithm should be light-weight to be deployed for practical clinical applications. Therefore, it is desired to design a light-weight network with high performance for retinal layer segmentation. In this paper, we propose LightReSeg for retinal layer segmentation which can be applied to OCT images. Specifically, our approach follows an encoder-decoder structure, where the encoder part employs multi-scale feature extraction and a Transformer block for fully exploiting the semantic information of feature maps at all scales and making the features have better global reasoning capabilities, while the decoder part, we design a multi-scale asymmetric attention (MAA) module for preserving the semantic information at each encoder scale. The experiments show that our approach achieves a better segmentation performance compared to the current state-of-the-art method TransUnet with 105.7M parameters on both our collected dataset and two other public datasets, with only 3.3M parameters.
Paper Structure (24 sections, 16 equations, 9 figures, 7 tables)

This paper contains 24 sections, 16 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: The model size v.s. the segmentation accuracy in terms of mIoU of the state-of-the-art retinal layer segmentation methods. Our LightReSeg achieves the highest mIoU compared to SOTA methods while maintaining a smaller model size.
  • Figure 2: The network of LightReSeg follows a U-shape encoder-decoder structure. The encoder takes as the input a retinal image of dimension $(3, H, W)$ and performs multi-scale feature extraction that outputs feature maps of $N$ scales, where $N$ is set to 4 in our design. The last feature map is further fed to the Transformer Layers through a linear transformation to extract features that reasons in long-range to help the reduction of the segmentation errors in the background region. The resulted features after the Transformer Layers are then fused via reshaping and up-sampling with the multi-scale encoder features that are optimized by the proposed MAA module. The final fused feature map $F_{out}$ is exploited for the retinal layer segmentation via convolutions.
  • Figure 3: The structure of Multi-scale Asymmetric Attention module.
  • Figure 4: (a) The visible light OCT device for capturing images. (b) Visible-light OCT B-scan image. (c) Annotation image (ground truth). (d) Average percentage of pixels on OCT images for each tissue layer besides the background(the percentage of background is 75.06%).
  • Figure 5: (a) and (b) are the performance of the nine mainstream approaches on the three retinal layer segmentation datasets measured by the mIoU and mPA metrics, respectively.
  • ...and 4 more figures