Table of Contents
Fetching ...

Learn From Zoom: Decoupled Supervised Contrastive Learning For WCE Image Classification

Kunpeng Qiu, Zhiying Zhou, Yongxin Guo

TL;DR

This work tackles the challenges of lesion classification in Wireless Capsule Endoscopy (WCE) images, where tiny lesions and background noise impair accuracy. It introduces Decoupled Supervised Contrastive Learning (DSCL) combined with a Saliency Augmentor (SA) to learn robust, multi-view representations from WCE data via a two-stage training pipeline: (1) DSCL-based feature learning using SA-generated views, and (2) training a linear classifier on frozen features. DSCL mitigates negative–positive coupling in supervised contrastive learning and enhances intra-class compactness while promoting inter-class separation, leading to an overall accuracy of 92.01% on a merged dataset and outperforming prior methods. The approach yields faster inference by avoiding resampling and demonstrates strong gains in intra-/inter-class discriminability, with supporting ablations and visualizations. This method holds practical potential for improving automated GI lesion screening from WCE imagery.

Abstract

Accurate lesion classification in Wireless Capsule Endoscopy (WCE) images is vital for early diagnosis and treatment of gastrointestinal (GI) cancers. However, this task is confronted with challenges like tiny lesions and background interference. Additionally, WCE images exhibit higher intra-class variance and inter-class similarities, adding complexity. To tackle these challenges, we propose Decoupled Supervised Contrastive Learning for WCE image classification, learning robust representations from zoomed-in WCE images generated by Saliency Augmentor. Specifically, We use uniformly down-sampled WCE images as anchors and WCE images from the same class, especially their zoomed-in images, as positives. This approach empowers the Feature Extractor to capture rich representations from various views of the same image, facilitated by Decoupled Supervised Contrastive Learning. Training a linear Classifier on these representations within 10 epochs yields an impressive 92.01% overall accuracy, surpassing the prior state-of-the-art (SOTA) by 0.72% on a blend of two publicly accessible WCE datasets. Code is available at: https://github.com/Qiukunpeng/DSCL.

Learn From Zoom: Decoupled Supervised Contrastive Learning For WCE Image Classification

TL;DR

This work tackles the challenges of lesion classification in Wireless Capsule Endoscopy (WCE) images, where tiny lesions and background noise impair accuracy. It introduces Decoupled Supervised Contrastive Learning (DSCL) combined with a Saliency Augmentor (SA) to learn robust, multi-view representations from WCE data via a two-stage training pipeline: (1) DSCL-based feature learning using SA-generated views, and (2) training a linear classifier on frozen features. DSCL mitigates negative–positive coupling in supervised contrastive learning and enhances intra-class compactness while promoting inter-class separation, leading to an overall accuracy of 92.01% on a merged dataset and outperforming prior methods. The approach yields faster inference by avoiding resampling and demonstrates strong gains in intra-/inter-class discriminability, with supporting ablations and visualizations. This method holds practical potential for improving automated GI lesion screening from WCE imagery.

Abstract

Accurate lesion classification in Wireless Capsule Endoscopy (WCE) images is vital for early diagnosis and treatment of gastrointestinal (GI) cancers. However, this task is confronted with challenges like tiny lesions and background interference. Additionally, WCE images exhibit higher intra-class variance and inter-class similarities, adding complexity. To tackle these challenges, we propose Decoupled Supervised Contrastive Learning for WCE image classification, learning robust representations from zoomed-in WCE images generated by Saliency Augmentor. Specifically, We use uniformly down-sampled WCE images as anchors and WCE images from the same class, especially their zoomed-in images, as positives. This approach empowers the Feature Extractor to capture rich representations from various views of the same image, facilitated by Decoupled Supervised Contrastive Learning. Training a linear Classifier on these representations within 10 epochs yields an impressive 92.01% overall accuracy, surpassing the prior state-of-the-art (SOTA) by 0.72% on a blend of two publicly accessible WCE datasets. Code is available at: https://github.com/Qiukunpeng/DSCL.
Paper Structure (13 sections, 5 equations, 2 figures, 3 tables)

This paper contains 13 sections, 5 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: The overall architecture of our proposed method. (a) Framework of Saliency Augmentor; (b) Principle diagram of non-uniform down-sampling, and (c) Framework of Decoupled Supervised Contrastive Learning.
  • Figure 2: The t-SNE Visualization of Feature Distribution. (a) Original WCE Images; (b) Output of $\mathcal{L}_{CE}$; (c) Output of $\mathcal{L}_{CE}$ with Zoomed-In; (d) Output of $\mathcal{L}_{DSCL}$ with Zoomed-In.