Table of Contents
Fetching ...

ECC-PolypDet: Enhanced CenterNet with Contrastive Learning for Automatic Polyp Detection

Yuncheng Jiang, Zixun Zhang, Yiwen Hu, Guanbin Li, Xiang Wan, Song Wu, Shuguang Cui, Silin Huang, Zhen Li

TL;DR

ECC-PolypDet tackles automatic polyp detection in colonoscopy by introducing a two-stage training paradigm that combines Box-assisted Contrastive Learning (BCL) with a Semantic Flow-guided FPN (SFFPN) and Heatmap Propagation (HP) to robustly detect concealed and small polyps; it then applies IoU-guided Sample Re-weighting (ISR) for hard-sample fine-tuning. The approach yields state-of-the-art results across six datasets with strong generalization, while maintaining practical inference speed by keeping most computations limited to training. Key innovations include a box-guided contrastive objective, multi-scale semantic alignment, progressive heatmap refinement, and adaptive loss re-weighting, all demonstrated through extensive ablations and cross-domain tests. The work holds practical significance for clinical colonoscopy by improving detection sensitivity and reducing missed polyps without sacrificing real-time performance.

Abstract

Accurate polyp detection is critical for early colorectal cancer diagnosis. Although remarkable progress has been achieved in recent years, the complex colon environment and concealed polyps with unclear boundaries still pose severe challenges in this area. Existing methods either involve computationally expensive context aggregation or lack prior modeling of polyps, resulting in poor performance in challenging cases. In this paper, we propose the Enhanced CenterNet with Contrastive Learning (ECC-PolypDet), a two-stage training \& end-to-end inference framework that leverages images and bounding box annotations to train a general model and fine-tune it based on the inference score to obtain a final robust model. Specifically, we conduct Box-assisted Contrastive Learning (BCL) during training to minimize the intra-class difference and maximize the inter-class difference between foreground polyps and backgrounds, enabling our model to capture concealed polyps. Moreover, to enhance the recognition of small polyps, we design the Semantic Flow-guided Feature Pyramid Network (SFFPN) to aggregate multi-scale features and the Heatmap Propagation (HP) module to boost the model's attention on polyp targets. In the fine-tuning stage, we introduce the IoU-guided Sample Re-weighting (ISR) mechanism to prioritize hard samples by adaptively adjusting the loss weight for each sample during fine-tuning. Extensive experiments on six large-scale colonoscopy datasets demonstrate the superiority of our model compared with previous state-of-the-art detectors.

ECC-PolypDet: Enhanced CenterNet with Contrastive Learning for Automatic Polyp Detection

TL;DR

ECC-PolypDet tackles automatic polyp detection in colonoscopy by introducing a two-stage training paradigm that combines Box-assisted Contrastive Learning (BCL) with a Semantic Flow-guided FPN (SFFPN) and Heatmap Propagation (HP) to robustly detect concealed and small polyps; it then applies IoU-guided Sample Re-weighting (ISR) for hard-sample fine-tuning. The approach yields state-of-the-art results across six datasets with strong generalization, while maintaining practical inference speed by keeping most computations limited to training. Key innovations include a box-guided contrastive objective, multi-scale semantic alignment, progressive heatmap refinement, and adaptive loss re-weighting, all demonstrated through extensive ablations and cross-domain tests. The work holds practical significance for clinical colonoscopy by improving detection sensitivity and reducing missed polyps without sacrificing real-time performance.

Abstract

Accurate polyp detection is critical for early colorectal cancer diagnosis. Although remarkable progress has been achieved in recent years, the complex colon environment and concealed polyps with unclear boundaries still pose severe challenges in this area. Existing methods either involve computationally expensive context aggregation or lack prior modeling of polyps, resulting in poor performance in challenging cases. In this paper, we propose the Enhanced CenterNet with Contrastive Learning (ECC-PolypDet), a two-stage training \& end-to-end inference framework that leverages images and bounding box annotations to train a general model and fine-tune it based on the inference score to obtain a final robust model. Specifically, we conduct Box-assisted Contrastive Learning (BCL) during training to minimize the intra-class difference and maximize the inter-class difference between foreground polyps and backgrounds, enabling our model to capture concealed polyps. Moreover, to enhance the recognition of small polyps, we design the Semantic Flow-guided Feature Pyramid Network (SFFPN) to aggregate multi-scale features and the Heatmap Propagation (HP) module to boost the model's attention on polyp targets. In the fine-tuning stage, we introduce the IoU-guided Sample Re-weighting (ISR) mechanism to prioritize hard samples by adaptively adjusting the loss weight for each sample during fine-tuning. Extensive experiments on six large-scale colonoscopy datasets demonstrate the superiority of our model compared with previous state-of-the-art detectors.
Paper Structure (27 sections, 13 equations, 10 figures, 7 tables, 1 algorithm)

This paper contains 27 sections, 13 equations, 10 figures, 7 tables, 1 algorithm.

Figures (10)

  • Figure 1: Illustration of the pipeline of our ECC-PolypDet. We add a supervised contrastive learning branch to increase the model's recognition capability of concealed polyps. We further modify the feature pyramid network (FPN) and detection head structure to capture small object features. Finally, we fine-tune the hard samples via a loss re-weighting method. During inference, our model follows an end-to-end manner. The modules that the dashed line flows through will be removed.
  • Figure 2: Detailed illustration of our ECC-PolypDet framework. It consists of the Semantic Flow-guided FPN (SFFPN), the CenterNet with a Heatmap Propagation (HP) module, a Box-assisted Contrastive Learning (BCL) module, and an IoU-guided Sample Re-weighting (ISR) module. Our ECC-PolypDet is jointly trained with detection loss and contrastive loss. After the first learning stage, the model is finetuned with adaptive sample importance weight processed by ISR. During inference, BCL and ISR modules will be removed. Only SFFPN and HP modules are adopted for prediction. The algorithm of our pipeline is presented in Alg. \ref{['alg:system']}.
  • Figure 3: (a) Overview of the Semantic Flow-guided Feature Pyramid Network (SFFPN). (b) Details of the semantic flow alignment module (SFA). SFA learns semantic flow from high and low resolution features, and SFFPN fuses different scales of features to a high-resolution feature.
  • Figure 4: Process of box-assisted contrastive learning. (BCL). In the Box-to-Mask Transformation stage, we use the bounding box annotation to generate the binary masks. The fused features are merged with the masks to get foreground and background features. Next, in the contrastive learning stage, two random foreground features and all the background features are used to calculate the $L_{CL}$ (infoNCE loss).
  • Figure 5: Visualization of the effect of adaptive hard mining strategy on the training loss of images. The point denotes the IoU-loss relationship. Before the fine-tuning stage, the points were scattered, and a large number of samples were distributed in abnormal areas. After fine-tuning, most samples were concentrated in normal areas.
  • ...and 5 more figures