Table of Contents
Fetching ...

TEMSET-24K: Densely Annotated Dataset for Indexing Multipart Endoscopic Videos using Surgical Timeline Segmentation

Muhammad Bilal, Mahmood Alam, Deepa Bapu, Stephan Korsgen, Neeraj Lal, Simon Bach, Amir M Hajivanand, Muhammed Ali, Kamran Soomro, Iqbal Qasim, Paweł Capik, Aslam Khan, Zaheer Khan, Hunaid Vohra, Massimo Caputo, Andrew Beggs, Adnan Qayyum, Junaid Qadir, Shazad Ashraf

TL;DR

TEMSET-24K, an open-source dataset comprising 24,306 trans-anal endoscopic microsurgery (TEMS) video microclips, provides a critical benchmark, propelling state-of-the-art solutions in surgical data science.

Abstract

Indexing endoscopic surgical videos is vital in surgical data science, forming the basis for systematic retrospective analysis and clinical performance evaluation. Despite its significance, current video analytics rely on manual indexing, a time-consuming process. Advances in computer vision, particularly deep learning, offer automation potential, yet progress is limited by the lack of publicly available, densely annotated surgical datasets. To address this, we present TEMSET-24K, an open-source dataset comprising 24,306 trans-anal endoscopic microsurgery (TEMS) video micro-clips. Each clip is meticulously annotated by clinical experts using a novel hierarchical labeling taxonomy encompassing phase, task, and action triplets, capturing intricate surgical workflows. To validate this dataset, we benchmarked deep learning models, including transformer-based architectures. Our in silico evaluation demonstrates high accuracy (up to 0.99) and F1 scores (up to 0.99) for key phases like Setup and Suturing. The STALNet model, tested with ConvNeXt, ViT, and SWIN V2 encoders, consistently segmented well-represented phases. TEMSET-24K provides a critical benchmark, propelling state-of-the-art solutions in surgical data science.

TEMSET-24K: Densely Annotated Dataset for Indexing Multipart Endoscopic Videos using Surgical Timeline Segmentation

TL;DR

TEMSET-24K, an open-source dataset comprising 24,306 trans-anal endoscopic microsurgery (TEMS) video microclips, provides a critical benchmark, propelling state-of-the-art solutions in surgical data science.

Abstract

Indexing endoscopic surgical videos is vital in surgical data science, forming the basis for systematic retrospective analysis and clinical performance evaluation. Despite its significance, current video analytics rely on manual indexing, a time-consuming process. Advances in computer vision, particularly deep learning, offer automation potential, yet progress is limited by the lack of publicly available, densely annotated surgical datasets. To address this, we present TEMSET-24K, an open-source dataset comprising 24,306 trans-anal endoscopic microsurgery (TEMS) video micro-clips. Each clip is meticulously annotated by clinical experts using a novel hierarchical labeling taxonomy encompassing phase, task, and action triplets, capturing intricate surgical workflows. To validate this dataset, we benchmarked deep learning models, including transformer-based architectures. Our in silico evaluation demonstrates high accuracy (up to 0.99) and F1 scores (up to 0.99) for key phases like Setup and Suturing. The STALNet model, tested with ConvNeXt, ViT, and SWIN V2 encoders, consistently segmented well-represented phases. TEMSET-24K provides a critical benchmark, propelling state-of-the-art solutions in surgical data science.

Paper Structure

This paper contains 29 sections, 7 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: TEMS surgical workflow:A typical surgical flow from landmarking of the rectal polyp to dissection, lesion removal and closure of the rectal wall defect. The key milestones of a TEMS procedure are detailed in images A-J: [A] Baseline lesion in view after setup; [B] Application of landmark dots to outline the lesion; [C] Dissection of the wall through the mucosa and muscle; [D-E] Circumferential removal of the lesion; [F-G] Final removal and extraction of the specimen; [H-I] Closure of the rectal wall defect with a suture; and [J] Application of a metal clip to secure the suture and ensure complete closure.
  • Figure 2: Proposed Taxonomy of TEMS Surgical Workflow.The TEMS operation can be split into three levels: [A] High level activity phase (such as Set-up, Dissection, Specimen Removal, Closure and Scope Removal), [B] Task based activities (such as scope insertion, instrument movement, site wash and pressure increase), [C] Small unit tasks (such as tissue marking, tissue retraction, smoke identification, bleeding identification and haemostasis).
  • Figure 3: Side-by-side comparison of videos before and after pre-processing.Panels (a) and (b) show image quality before and after pre-processing, respectively. This shows that despite reduction in the size of the ESV file by a factor of 10 (from 1GB to 0.1GB), there was no loss in quality.
  • Figure 4: Methodology adopted for annotating TEMS surgical videos for surgical timeline segmentation includes six major steps:(1)TEMS surgical data acquisition (2) Data Preprocessing, (3) Data Annotation, (4) Annotation Verification, (5) Data Post Processing, and (6) Surgical data preparation for training timeline segmentation models.
  • Figure 5: Proposed SpatioTemporal Adaptive LSTM Network (STALNet) for Surgical Timeline Segmentation:This network diagram shows the process by which ESV clips are analysed by encoders in order to apply reliable timeline segments.
  • ...and 2 more figures