Weakly Supervised YOLO Network for Surgical Instrument Localization in Endoscopic Videos
Rongfeng Wei, Jinlin Wu, Xuexue Bai, Ming Feng, Zhen Lei, Hongbin Liu, Zhen Chen
TL;DR
We present WS-YOLO, a weakly supervised localization framework for surgical instruments in endoscopic videos that leverages instrument category information as supervision. The method initializes with a category-free detector trained on SIMS to localize instrument parts, then iteratively refines pseudo-labels through cross-detector consistency in a multi-round training loop, reducing annotation requirements while boosting localization accuracy. Experiments on the Endoscopic Vision Challenge 2023 dataset show progressive improvements in mAP from 4.3% to 15.7% across rounds, validating the effectiveness of the approach. The work demonstrates that weak supervision with iterative pseudo-label refinement can achieve competitive instrument localization in realistic surgical video settings, and code is publicly available.
Abstract
In minimally invasive surgery, surgical instrument localization is a crucial task for endoscopic videos, which enables various applications for improving surgical outcomes. However, annotating the instrument localization in endoscopic videos is tedious and labor-intensive. In contrast, obtaining the category information is easy and efficient in real-world applications. To fully utilize the category information and address the localization problem, we propose a weakly supervised localization framework named WS-YOLO for surgical instruments. By leveraging the instrument category information as the weak supervision, our WS-YOLO framework adopts an unsupervised multi-round training strategy for the localization capability training. We validate our WS-YOLO framework on the Endoscopic Vision Challenge 2023 dataset, which achieves remarkable performance in the weakly supervised surgical instrument localization. The source code is available at https://github.com/Breezewrf/WS-YOLO.
