Table of Contents
Fetching ...

From Labels to Priors in Capsule Endoscopy: A Prior Guided Approach for Improving Generalization with Few Labels

Anuja Vats, Ahmed Mohammed, Marius Pedersen

TL;DR

The paper addresses the poor generalization of WCE pathology classification under limited labels by introducing a domain-prior guided unsupervised pretraining framework. It combines Prior Guided Contrast (PGCon) and Within Instance Negative (WINCon) to bias representations toward pathology-related features using a redness-based prior and locality priors, paired with a memory-bank-based InfoNCE objective. Across zero-shot, linear, and full fine-tuning evaluations on multiple WCE datasets, PGCon and WINCon demonstrate strong cross-dataset generalization and competitive performance relative to ImageNet supervised pretraining, often surpassing prior unsupervised methods. The results indicate that simple domain priors can yield pathology-aware representations with reduced labeling burden, and the work provides benchmarks and pretrained weights to facilitate broader adoption in WCE diagnostics.

Abstract

The lack of generalizability of deep learning approaches for the automated diagnosis of pathologies in Wireless Capsule Endoscopy (WCE) has prevented any significant advantages from trickling down to real clinical practices. As a result, disease management using WCE continues to depend on exhaustive manual investigations by medical experts. This explains its limited use despite several advantages. Prior works have considered using higher quality and quantity of labels as a way of tackling the lack of generalization, however this is hardly scalable considering pathology diversity not to mention that labeling large datasets encumbers the medical staff additionally. We propose using freely available domain knowledge as priors to learn more robust and generalizable representations. We experimentally show that domain priors can benefit representations by acting in proxy of labels, thereby significantly reducing the labeling requirement while still enabling fully unsupervised yet pathology-aware learning. We use the contrastive objective along with prior-guided views during pretraining, where the view choices inspire sensitivity to pathological information. Extensive experiments on three datasets show that our method performs better than (or closes gap with) the state-of-the-art in the domain, establishing a new benchmark in pathology classification and cross-dataset generalization, as well as scaling to unseen pathology categories.

From Labels to Priors in Capsule Endoscopy: A Prior Guided Approach for Improving Generalization with Few Labels

TL;DR

The paper addresses the poor generalization of WCE pathology classification under limited labels by introducing a domain-prior guided unsupervised pretraining framework. It combines Prior Guided Contrast (PGCon) and Within Instance Negative (WINCon) to bias representations toward pathology-related features using a redness-based prior and locality priors, paired with a memory-bank-based InfoNCE objective. Across zero-shot, linear, and full fine-tuning evaluations on multiple WCE datasets, PGCon and WINCon demonstrate strong cross-dataset generalization and competitive performance relative to ImageNet supervised pretraining, often surpassing prior unsupervised methods. The results indicate that simple domain priors can yield pathology-aware representations with reduced labeling burden, and the work provides benchmarks and pretrained weights to facilitate broader adoption in WCE diagnostics.

Abstract

The lack of generalizability of deep learning approaches for the automated diagnosis of pathologies in Wireless Capsule Endoscopy (WCE) has prevented any significant advantages from trickling down to real clinical practices. As a result, disease management using WCE continues to depend on exhaustive manual investigations by medical experts. This explains its limited use despite several advantages. Prior works have considered using higher quality and quantity of labels as a way of tackling the lack of generalization, however this is hardly scalable considering pathology diversity not to mention that labeling large datasets encumbers the medical staff additionally. We propose using freely available domain knowledge as priors to learn more robust and generalizable representations. We experimentally show that domain priors can benefit representations by acting in proxy of labels, thereby significantly reducing the labeling requirement while still enabling fully unsupervised yet pathology-aware learning. We use the contrastive objective along with prior-guided views during pretraining, where the view choices inspire sensitivity to pathological information. Extensive experiments on three datasets show that our method performs better than (or closes gap with) the state-of-the-art in the domain, establishing a new benchmark in pathology classification and cross-dataset generalization, as well as scaling to unseen pathology categories.
Paper Structure (13 sections, 3 equations, 4 figures, 4 tables)

This paper contains 13 sections, 3 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Proposed approach: Given an unlabeled WCE image $v^i$, we use priors to create special views namely a pathology-aware view $v_p^i$, a pathology-ignorant view $v_{win}^i$ and a distorted view $v_d^i$. $z_*^*$ denotes the encodings of these views. We use combinations of these views with contrastive objectives to strategically emphasize on pathology features during training. The resulting feature space (pink circle) shows how pathology-aware features $z_p^i$ and $z^j$ dominate the output space and push away from pathology-ignorant features $z_{win}$. We show the real contrastive space and analyze its properties fig.\ref{['evolution']} and sec.\ref{['sec:embeddingsec']}.
  • Figure 2: Overview of proposed objectives. (a) Two views $v_i^p$ (prior view) and $v_i^d$ (distorted view) constructed from the same image are encoded as $z_p^i$, $z_d^i$ respectively. The contrastive objective uses these as well as representations $R_{GN}$ and $R_i$ from an evolving memory bank $\mathcal{M}$ to minimize distance between positives and maximize distance between negatives. (b) In addition to $v_i^p$ and $v_i^d$, WINCon uses $v_{win}$ derived from all images in the batch (B) by removing regions suspected of pathology. These $v_{win}$ are transformed into $z_{win}$ and used as additional negatives. Refer to sections \ref{['sec:PGCON']} and \ref{['sec:WIN']} for more details.
  • Figure 3: Evolution of the embedding space: PCA of 128d feature vectors for $z_p$, $z_d$, Global Negatives (GN) and WIN. PGCon : Initially the embeddings start as separate localized clusters corresponding to $z_p$, $z_d$ and global negatives, but as the embeddings slowly specialize in pathology regions (start exhibiting invariance to other factors), the embeddings are seen to merge and spread out to a space of pathologies. WINCon : Initially the WINs lie close to corresponding prior views ($z_p$) due to being parts of the same image. However, interestingly, as the embeddings get more and more specialized in pathologies, the same WINs are pushed away from pathology based embeddings i.e., $z_p$ and GNs. Despite the WINs being very diverse, a dense WIN cluster suggests a tendency towards invariance to normal variations and high variance towards pathologies.
  • Figure 4: (a.) Alignment and Uniformity : The graph illustrates $\mathcal{L}_{align}$ vs $\mathcal{L}_{uniform}$ for different encoders evaluated for OSF-Kvasir, CAD-CAP and subset of train data. The points in the plot are color coded for full fine tuning accuracy. (b.) Activation map : Visualization using ScoreCAM wang2020scorecam shows the effectiveness of our approach for non-red (polyp, ulcer) and low prevalence (in train and test sets) pathologies. It also shows that WINCon exhibits higher locality compared to PGCon. For more visualizations refer to the supplementary video.