Table of Contents
Fetching ...

EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis

Ruijie Yang, Yan Zhu, Peiyao Fu, Yizhe Zhang, Zhihua Wang, Quanlin Li, Pinghong Zhou, Xian Yang, Shuo Wang

TL;DR

EndoFinder addresses the need for explainable, real-time polyp diagnosis by reframing optical assessment as content-based image retrieval using a digital-twin polyp reference. It trains a polyp-aware ViT encoder via self-supervised learning that blends masked image modeling and contrastive learning, yielding embeddings $z = E_φ(I)$ which are binarized for fast retrieval. The retrieval process relies on semantic hashing and a voting rule over the $K$ nearest neighbors to predict pathology, with an adaptive masking strategy guided by segmentation masks to focus on informative polyp regions. Across Polyp-18k, Polyp-Twin, and Polyp-Path, EndoFinder achieves state-of-the-art polyp re-identification and competitive optical biopsy performance while enabling real-time, explainable decision support through a scalable hashing-based framework.

Abstract

Determining the necessity of resecting malignant polyps during colonoscopy screen is crucial for patient outcomes, yet challenging due to the time-consuming and costly nature of histopathology examination. While deep learning-based classification models have shown promise in achieving optical biopsy with endoscopic images, they often suffer from a lack of explainability. To overcome this limitation, we introduce EndoFinder, a content-based image retrieval framework to find the 'digital twin' polyp in the reference database given a newly detected polyp. The clinical semantics of the new polyp can be inferred referring to the matched ones. EndoFinder pioneers a polyp-aware image encoder that is pre-trained on a large polyp dataset in a self-supervised way, merging masked image modeling with contrastive learning. This results in a generic embedding space ready for different downstream clinical tasks based on image retrieval. We validate the framework on polyp re-identification and optical biopsy tasks, with extensive experiments demonstrating that EndoFinder not only achieves explainable diagnostics but also matches the performance of supervised classification models. EndoFinder's reliance on image retrieval has the potential to support diverse downstream decision-making tasks during real-time colonoscopy procedures.

EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis

TL;DR

EndoFinder addresses the need for explainable, real-time polyp diagnosis by reframing optical assessment as content-based image retrieval using a digital-twin polyp reference. It trains a polyp-aware ViT encoder via self-supervised learning that blends masked image modeling and contrastive learning, yielding embeddings which are binarized for fast retrieval. The retrieval process relies on semantic hashing and a voting rule over the nearest neighbors to predict pathology, with an adaptive masking strategy guided by segmentation masks to focus on informative polyp regions. Across Polyp-18k, Polyp-Twin, and Polyp-Path, EndoFinder achieves state-of-the-art polyp re-identification and competitive optical biopsy performance while enabling real-time, explainable decision support through a scalable hashing-based framework.

Abstract

Determining the necessity of resecting malignant polyps during colonoscopy screen is crucial for patient outcomes, yet challenging due to the time-consuming and costly nature of histopathology examination. While deep learning-based classification models have shown promise in achieving optical biopsy with endoscopic images, they often suffer from a lack of explainability. To overcome this limitation, we introduce EndoFinder, a content-based image retrieval framework to find the 'digital twin' polyp in the reference database given a newly detected polyp. The clinical semantics of the new polyp can be inferred referring to the matched ones. EndoFinder pioneers a polyp-aware image encoder that is pre-trained on a large polyp dataset in a self-supervised way, merging masked image modeling with contrastive learning. This results in a generic embedding space ready for different downstream clinical tasks based on image retrieval. We validate the framework on polyp re-identification and optical biopsy tasks, with extensive experiments demonstrating that EndoFinder not only achieves explainable diagnostics but also matches the performance of supervised classification models. EndoFinder's reliance on image retrieval has the potential to support diverse downstream decision-making tasks during real-time colonoscopy procedures.
Paper Structure (9 sections, 5 equations, 4 figures, 2 tables)

This paper contains 9 sections, 5 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Workflow of the proposed EndoFinder framework. Endoscopic images are encoded into polyp-aware semantic features and discretised into hash codes for fast retrieval. The decision-making is augmented by referring to the historical information of the 'digital twin' polyp in the database.
  • Figure 2: Polyp-aware self-supervised representation learning and inference.
  • Figure 3: Examples of polyp re-identification results. Each row depicts a polyp, showing the query image followed by the first retrieval results from EndoFinder, pre-trained SSCD, VGG19 and Densenet121, respectively. Correct retrievals are bounded in red.
  • Figure 4: Examples of image-retrieval based classification by EndoFinder.