EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis
Ruijie Yang, Yan Zhu, Peiyao Fu, Yizhe Zhang, Zhihua Wang, Quanlin Li, Pinghong Zhou, Xian Yang, Shuo Wang
TL;DR
EndoFinder addresses the need for explainable, real-time polyp diagnosis by reframing optical assessment as content-based image retrieval using a digital-twin polyp reference. It trains a polyp-aware ViT encoder via self-supervised learning that blends masked image modeling and contrastive learning, yielding embeddings $z = E_φ(I)$ which are binarized for fast retrieval. The retrieval process relies on semantic hashing and a voting rule over the $K$ nearest neighbors to predict pathology, with an adaptive masking strategy guided by segmentation masks to focus on informative polyp regions. Across Polyp-18k, Polyp-Twin, and Polyp-Path, EndoFinder achieves state-of-the-art polyp re-identification and competitive optical biopsy performance while enabling real-time, explainable decision support through a scalable hashing-based framework.
Abstract
Determining the necessity of resecting malignant polyps during colonoscopy screen is crucial for patient outcomes, yet challenging due to the time-consuming and costly nature of histopathology examination. While deep learning-based classification models have shown promise in achieving optical biopsy with endoscopic images, they often suffer from a lack of explainability. To overcome this limitation, we introduce EndoFinder, a content-based image retrieval framework to find the 'digital twin' polyp in the reference database given a newly detected polyp. The clinical semantics of the new polyp can be inferred referring to the matched ones. EndoFinder pioneers a polyp-aware image encoder that is pre-trained on a large polyp dataset in a self-supervised way, merging masked image modeling with contrastive learning. This results in a generic embedding space ready for different downstream clinical tasks based on image retrieval. We validate the framework on polyp re-identification and optical biopsy tasks, with extensive experiments demonstrating that EndoFinder not only achieves explainable diagnostics but also matches the performance of supervised classification models. EndoFinder's reliance on image retrieval has the potential to support diverse downstream decision-making tasks during real-time colonoscopy procedures.
