Self-Supervised Polyp Re-Identification in Colonoscopy
Yotam Intrator, Natalie Aizenberg, Amir Livne, Ehud Rivlin, Roman Goldenberg
TL;DR
The paper tackles the challenge of long-term polyp tracking in colonoscopy to support CADx and automated reporting by introducing a self-supervised, appearance-based ReID framework. It combines an early-fusion, transformer-based multi-view tracklet encoder with a SimCLR-style contrastive objective, and also explores a single-frame representation, both trained without manual labels; positives derive from temporal views of the same polyp and pseudo-positives are created by splitting tracklets. The approach improves tracklet grouping, reducing fragmentation and enhancing CADx accuracy (e.g., AUROC up to $0.77$ for ReID, CADx AUC up to $0.90$ with ReID vs $0.86$ for tracking), approaching the performance attainable with manually annotated GT. These results demonstrate the practical impact of appearance-based ReID on data aggregation, reporting, and clinical metrics in colonoscopy, while acknowledging limitations when polyp appearance changes during procedures and suggesting broader applications in automated reporting and metric computation.
Abstract
Computer-aided polyp detection (CADe) is becoming a standard, integral part of any modern colonoscopy system. A typical colonoscopy CADe detects a polyp in a single frame and does not track it through the video sequence. Yet, many downstream tasks including polyp characterization (CADx), quality metrics, automatic reporting, require aggregating polyp data from multiple frames. In this work we propose a robust long term polyp tracking method based on re-identification by visual appearance. Our solution uses an attention-based self-supervised ML model, specifically designed to leverage the temporal nature of video input. We quantitatively evaluate method's performance and demonstrate its value for the CADx task.
