Adaptation of Distinct Semantics for Uncertain Areas in Polyp Segmentation
Quang Vinh Nguyen, Van Thong Huynh, Soo-Hyung Kim
TL;DR
The paper addresses the challenge of polyp segmentation in colonoscopy images, where uncertain areas and background similarity hinder accurate delineation. It introduces ADSNet, a CNN-based framework with an EfficientNet-V2S encoder, a Complementary Trilateral Decoder that generates an early global map $M = \text{CTD}(f_1,f_2,f_3,f_4)$, and a Continuous Attention module that yields Background Semantic ($BS$) and Object Semantic ($OS$) to refine difficult regions, optimized by a joint $Loss(y,\hat{y}) = ACE(y,\hat{y}) + BCE(y,\hat{y})$. The approach achieves state-of-the-art Dice, IoU, and MAE on multiple polyp benchmarks and demonstrates strong generalization to unseen datasets, while remaining compatible with other CNN or Transformer encoders. By explicitly modeling uncertain areas and recovering weak features through $OS$ and $BS$, ADSNet enhances robustness and clinical utility in automated polyp segmentation.
Abstract
Colonoscopy is a common and practical method for detecting and treating polyps. Segmenting polyps from colonoscopy image is useful for diagnosis and surgery progress. Nevertheless, achieving excellent segmentation performance is still difficult because of polyp characteristics like shape, color, condition, and obvious non-distinction from the surrounding context. This work presents a new novel architecture namely Adaptation of Distinct Semantics for Uncertain Areas in Polyp Segmentation (ADSNet), which modifies misclassified details and recovers weak features having the ability to vanish and not be detected at the final stage. The architecture consists of a complementary trilateral decoder to produce an early global map. A continuous attention module modifies semantics of high-level features to analyze two separate semantics of the early global map. The suggested method is experienced on polyp benchmarks in learning ability and generalization ability, experimental results demonstrate the great correction and recovery ability leading to better segmentation performance compared to the other state of the art in the polyp image segmentation task. Especially, the proposed architecture could be experimented flexibly for other CNN-based encoders, Transformer-based encoders, and decoder backbones.
