Table of Contents
Fetching ...

Weakly Supervised YOLO Network for Surgical Instrument Localization in Endoscopic Videos

Rongfeng Wei, Jinlin Wu, Xuexue Bai, Ming Feng, Zhen Lei, Hongbin Liu, Zhen Chen

TL;DR

We present WS-YOLO, a weakly supervised localization framework for surgical instruments in endoscopic videos that leverages instrument category information as supervision. The method initializes with a category-free detector trained on SIMS to localize instrument parts, then iteratively refines pseudo-labels through cross-detector consistency in a multi-round training loop, reducing annotation requirements while boosting localization accuracy. Experiments on the Endoscopic Vision Challenge 2023 dataset show progressive improvements in mAP from 4.3% to 15.7% across rounds, validating the effectiveness of the approach. The work demonstrates that weak supervision with iterative pseudo-label refinement can achieve competitive instrument localization in realistic surgical video settings, and code is publicly available.

Abstract

In minimally invasive surgery, surgical instrument localization is a crucial task for endoscopic videos, which enables various applications for improving surgical outcomes. However, annotating the instrument localization in endoscopic videos is tedious and labor-intensive. In contrast, obtaining the category information is easy and efficient in real-world applications. To fully utilize the category information and address the localization problem, we propose a weakly supervised localization framework named WS-YOLO for surgical instruments. By leveraging the instrument category information as the weak supervision, our WS-YOLO framework adopts an unsupervised multi-round training strategy for the localization capability training. We validate our WS-YOLO framework on the Endoscopic Vision Challenge 2023 dataset, which achieves remarkable performance in the weakly supervised surgical instrument localization. The source code is available at https://github.com/Breezewrf/WS-YOLO.

Weakly Supervised YOLO Network for Surgical Instrument Localization in Endoscopic Videos

TL;DR

We present WS-YOLO, a weakly supervised localization framework for surgical instruments in endoscopic videos that leverages instrument category information as supervision. The method initializes with a category-free detector trained on SIMS to localize instrument parts, then iteratively refines pseudo-labels through cross-detector consistency in a multi-round training loop, reducing annotation requirements while boosting localization accuracy. Experiments on the Endoscopic Vision Challenge 2023 dataset show progressive improvements in mAP from 4.3% to 15.7% across rounds, validating the effectiveness of the approach. The work demonstrates that weak supervision with iterative pseudo-label refinement can achieve competitive instrument localization in realistic surgical video settings, and code is publicly available.

Abstract

In minimally invasive surgery, surgical instrument localization is a crucial task for endoscopic videos, which enables various applications for improving surgical outcomes. However, annotating the instrument localization in endoscopic videos is tedious and labor-intensive. In contrast, obtaining the category information is easy and efficient in real-world applications. To fully utilize the category information and address the localization problem, we propose a weakly supervised localization framework named WS-YOLO for surgical instruments. By leveraging the instrument category information as the weak supervision, our WS-YOLO framework adopts an unsupervised multi-round training strategy for the localization capability training. We validate our WS-YOLO framework on the Endoscopic Vision Challenge 2023 dataset, which achieves remarkable performance in the weakly supervised surgical instrument localization. The source code is available at https://github.com/Breezewrf/WS-YOLO.
Paper Structure (4 sections, 1 equation, 1 figure, 1 table, 1 algorithm)

This paper contains 4 sections, 1 equation, 1 figure, 1 table, 1 algorithm.

Figures (1)

  • Figure 1: The overview of our WS-YOLO framework.