Table of Contents
Fetching ...

Exploring the Effect of Dataset Diversity in Self-Supervised Learning for Surgical Computer Vision

Tim J. M. Jaspers, Ronald L. P. D. de Jong, Yasmina Al Khalil, Tijn Zeelenberg, Carolus H. J. Kusters, Yiping Li, Romy C. van Jaarsveld, Franciscus H. A. Bakker, Jelle P. Ruurda, Willem M. Brinkman, Peter H. N. De With, Fons van der Sommen

TL;DR

This study investigates the role of dataset diversity in SSL for surgical computer vision, comparing procedure-specific datasets against a more heterogeneous general surgical dataset across three different downstream surgical applications and shows that increasing diversity within SSL data is beneficial for model performance.

Abstract

Over the past decade, computer vision applications in minimally invasive surgery have rapidly increased. Despite this growth, the impact of surgical computer vision remains limited compared to other medical fields like pathology and radiology, primarily due to the scarcity of representative annotated data. Whereas transfer learning from large annotated datasets such as ImageNet has been conventionally the norm to achieve high-performing models, recent advancements in self-supervised learning (SSL) have demonstrated superior performance. In medical image analysis, in-domain SSL pretraining has already been shown to outperform ImageNet-based initialization. Although unlabeled data in the field of surgical computer vision is abundant, the diversity within this data is limited. This study investigates the role of dataset diversity in SSL for surgical computer vision, comparing procedure-specific datasets against a more heterogeneous general surgical dataset across three different downstream surgical applications. The obtained results show that using solely procedure-specific data can lead to substantial improvements of 13.8%, 9.5%, and 36.8% compared to ImageNet pretraining. However, extending this data with more heterogeneous surgical data further increases performance by an additional 5.0%, 5.2%, and 2.5%, suggesting that increasing diversity within SSL data is beneficial for model performance. The code and pretrained model weights are made publicly available at https://github.com/TimJaspers0801/SurgeNet.

Exploring the Effect of Dataset Diversity in Self-Supervised Learning for Surgical Computer Vision

TL;DR

This study investigates the role of dataset diversity in SSL for surgical computer vision, comparing procedure-specific datasets against a more heterogeneous general surgical dataset across three different downstream surgical applications and shows that increasing diversity within SSL data is beneficial for model performance.

Abstract

Over the past decade, computer vision applications in minimally invasive surgery have rapidly increased. Despite this growth, the impact of surgical computer vision remains limited compared to other medical fields like pathology and radiology, primarily due to the scarcity of representative annotated data. Whereas transfer learning from large annotated datasets such as ImageNet has been conventionally the norm to achieve high-performing models, recent advancements in self-supervised learning (SSL) have demonstrated superior performance. In medical image analysis, in-domain SSL pretraining has already been shown to outperform ImageNet-based initialization. Although unlabeled data in the field of surgical computer vision is abundant, the diversity within this data is limited. This study investigates the role of dataset diversity in SSL for surgical computer vision, comparing procedure-specific datasets against a more heterogeneous general surgical dataset across three different downstream surgical applications. The obtained results show that using solely procedure-specific data can lead to substantial improvements of 13.8%, 9.5%, and 36.8% compared to ImageNet pretraining. However, extending this data with more heterogeneous surgical data further increases performance by an additional 5.0%, 5.2%, and 2.5%, suggesting that increasing diversity within SSL data is beneficial for model performance. The code and pretrained model weights are made publicly available at https://github.com/TimJaspers0801/SurgeNet.
Paper Structure (9 sections, 5 figures, 2 tables)

This paper contains 9 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Segmentation results (Dice scores) on the three downstream datasets from the fivefold cross-validation. The performance is given for both fine-tuning the entire segmentation model and the model with a frozen encoder.
  • Figure 2: Visual examples from the three downstream test sets. The first two rows display samples from the CholecSeg8k test set, the subsequent two rows show samples from the RAMIE test set, and the final two rows present samples from the RARP test set.
  • Figure 3: 2D-TSNE visualization of learned representations of 1000 randomly sampled images per surgical procedures of SurgeNet and ImageNet.
  • Figure 4: 400 randomly sampled images from SurgeNet.
  • Figure 5: Per-class performance evaluation on three downstream datasets. The top left radar plot displays the results for the CholecSeg8k test set. The top right plot presents the results for the RAMIE test set. The bottom plot illustrates the results for the RARP dataset. The differences are most pronounced for the smaller anatomical structures.