Table of Contents
Fetching ...

TransUNext: towards a more advanced U-shaped framework for automatic vessel segmentation in the fundus image

Xiang Li, Mingsi Liu, Lixin Duan

TL;DR

A more advanced U-shaped architecture for a hybrid Transformer and CNN: TransUNext is proposed, which integrates an Efficient Self-attention Mechanism into the encoder and decoder of U-Net to capture both local features and global dependencies with minimal computational overhead.

Abstract

Purpose: Automatic and accurate segmentation of fundus vessel images has become an essential prerequisite for computer-aided diagnosis of ophthalmic diseases such as diabetes mellitus. The task of high-precision retinal vessel segmentation still faces difficulties due to the low contrast between the branch ends of retinal vessels and the background, the long and thin vessel span, and the variable morphology of the optic disc and optic cup in fundus vessel images. Methods: We propose a more advanced U-shaped architecture for a hybrid Transformer and CNN: TransUNext, which integrates an Efficient Self-attention Mechanism into the encoder and decoder of U-Net to capture both local features and global dependencies with minimal computational overhead. Meanwhile, the Global Multi-Scale Fusion (GMSF) module is further introduced to upgrade skip-connections, fuse high-level semantic and low-level detailed information, and eliminate high- and low-level semantic differences. Inspired by ConvNeXt, TransNeXt Block is designed to optimize the computational complexity of each base block in U-Net and avoid the information loss caused by the compressed dimension when the information is converted between the feature spaces of different dimensions. Results: We evaluated the proposed method on four public datasets DRIVE, STARE, CHASE-DB1, and HRF. In the experimental results, the AUC (area under the ROC curve) values were 0.9867, 0.9869, 0.9910, and 0.9887, which exceeded the other state-of-the-art.

TransUNext: towards a more advanced U-shaped framework for automatic vessel segmentation in the fundus image

TL;DR

A more advanced U-shaped architecture for a hybrid Transformer and CNN: TransUNext is proposed, which integrates an Efficient Self-attention Mechanism into the encoder and decoder of U-Net to capture both local features and global dependencies with minimal computational overhead.

Abstract

Purpose: Automatic and accurate segmentation of fundus vessel images has become an essential prerequisite for computer-aided diagnosis of ophthalmic diseases such as diabetes mellitus. The task of high-precision retinal vessel segmentation still faces difficulties due to the low contrast between the branch ends of retinal vessels and the background, the long and thin vessel span, and the variable morphology of the optic disc and optic cup in fundus vessel images. Methods: We propose a more advanced U-shaped architecture for a hybrid Transformer and CNN: TransUNext, which integrates an Efficient Self-attention Mechanism into the encoder and decoder of U-Net to capture both local features and global dependencies with minimal computational overhead. Meanwhile, the Global Multi-Scale Fusion (GMSF) module is further introduced to upgrade skip-connections, fuse high-level semantic and low-level detailed information, and eliminate high- and low-level semantic differences. Inspired by ConvNeXt, TransNeXt Block is designed to optimize the computational complexity of each base block in U-Net and avoid the information loss caused by the compressed dimension when the information is converted between the feature spaces of different dimensions. Results: We evaluated the proposed method on four public datasets DRIVE, STARE, CHASE-DB1, and HRF. In the experimental results, the AUC (area under the ROC curve) values were 0.9867, 0.9869, 0.9910, and 0.9887, which exceeded the other state-of-the-art.

Paper Structure

This paper contains 16 sections, 7 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Comparison of several hybrid architectures of Transformer and CNN. (Color figure online)
  • Figure 2: The flowchart of data preprocessing.
  • Figure 3: Illustration of random cropping. (a) patches from the original image (b) patches of ground truth.
  • Figure 4: (a) The architecture of TransUNext; (b) Hybird block consisting of Transformer and ConvNeXt (TransNeXt Block).
  • Figure 5: Efficient multi-head self-attention (MHSA). (a) The MHSA used in the Transformer encoder. (b) The MHSA used in the Transformer decoder. They share similar concepts, but (b) takes two inputs, including the high-resolution features from GMSF of the encoder, and the low-resolution features from the decoder.
  • ...and 5 more figures