RTA-Former: Reverse Transformer Attention for Polyp Segmentation
Zhikai Li, Murong Yi, Ali Uneri, Sihan Niu, Craig Jones
TL;DR
This work tackles the challenge of precise edge segmentation in polyp segmentation by integrating a Transformer Encoder based on PVT with a novel Reverse Transformer Attention (RTA) module in the decoder. The proposed RTA-Former uses a Hierarchical Feature Synthesizer to fuse multi-scale transformer features and applies a reverse attention mechanism to emphasize difficult edge regions, yielding state-of-the-art results on five polyp datasets. Across multiple backbone sizes, the method demonstrates strong learning and generalization, outperforming CNN-based approaches and indicating practical potential for clinical decision support. The approach also provides flexibility in backbone size to balance accuracy and computational cost, with public code for reproducibility.
Abstract
Polyp segmentation is a key aspect of colorectal cancer prevention, enabling early detection and guiding subsequent treatments. Intelligent diagnostic tools, including deep learning solutions, are widely explored to streamline and potentially automate this process. However, even with many powerful network architectures, there still comes the problem of producing accurate edge segmentation. In this paper, we introduce a novel network, namely RTA-Former, that employs a transformer model as the encoder backbone and innovatively adapts Reverse Attention (RA) with a transformer stage in the decoder for enhanced edge segmentation. The results of the experiments illustrate that RTA-Former achieves state-of-the-art (SOTA) performance in five polyp segmentation datasets. The strong capability of RTA-Former holds promise in improving the accuracy of Transformer-based polyp segmentation, potentially leading to better clinical decisions and patient outcomes. Our code is publicly available on GitHub.
