Table of Contents
Fetching ...

Learning from Single Timestamps: Complexity Estimation in Laparoscopic Cholecystectomy

Dimitrios Anastasiou, Santiago Barbarisi, Lucy Culshaw, Jayna Patel, Evangelos B. Mazomenos, Imanol Luengo, Danail Stoyanov

TL;DR

This work tackles automated surgical complexity estimation in laparoscopic cholecystectomy by leveraging the Parkland Grading Scale under weak temporal supervision. It introduces STC-Net, a pipeline that jointly localizes informative video segments and grades inflammation using a Localization Module, a dynamic Window Proposal Module, and a Grading Module built on MS-TCN, guided by a hybrid localization loss and a background-aware grading loss. On a large private LC video dataset, STC-Net achieves 62.11% accuracy and 61.42% F1, closely matching a fully trimmed upper bound while outperforming a non-localized Full baseline by over 10%, and ablations confirm the value of dynamic windows, the two-stage training, and the Highest Peak consensus. The approach demonstrates scalable, weakly supervised automatic PGS-based complexity estimation from full videos, with practical implications for postoperative analysis and surgical training.

Abstract

Purpose: Accurate assessment of surgical complexity is essential in Laparoscopic Cholecystectomy (LC), where severe inflammation is associated with longer operative times and increased risk of postoperative complications. The Parkland Grading Scale (PGS) provides a clinically validated framework for stratifying inflammation severity; however, its automation in surgical videos remains largely unexplored, particularly in realistic scenarios where complete videos must be analyzed without prior manual curation. Methods: In this work, we introduce STC-Net, a novel framework for SingleTimestamp-based Complexity estimation in LC via the PGS, designed to operate under weak temporal supervision. Unlike prior methods limited to static images or manually trimmed clips, STC-Net operates directly on full videos. It jointly performs temporal localization and grading through a localization, window proposal, and grading module. We introduce a novel loss formulation combining hard and soft localization objectives and background-aware grading supervision. Results: Evaluated on a private dataset of 1,859 LC videos, STC-Net achieves an accuracy of 62.11% and an F1-score of 61.42%, outperforming non-localized baselines by over 10% in both metrics and highlighting the effectiveness of weak supervision for surgical complexity assessment. Conclusion: STC-Net demonstrates a scalable and effective approach for automated PGS-based surgical complexity estimation from full LC videos, making it promising for post-operative analysis and surgical training.

Learning from Single Timestamps: Complexity Estimation in Laparoscopic Cholecystectomy

TL;DR

This work tackles automated surgical complexity estimation in laparoscopic cholecystectomy by leveraging the Parkland Grading Scale under weak temporal supervision. It introduces STC-Net, a pipeline that jointly localizes informative video segments and grades inflammation using a Localization Module, a dynamic Window Proposal Module, and a Grading Module built on MS-TCN, guided by a hybrid localization loss and a background-aware grading loss. On a large private LC video dataset, STC-Net achieves 62.11% accuracy and 61.42% F1, closely matching a fully trimmed upper bound while outperforming a non-localized Full baseline by over 10%, and ablations confirm the value of dynamic windows, the two-stage training, and the Highest Peak consensus. The approach demonstrates scalable, weakly supervised automatic PGS-based complexity estimation from full videos, with practical implications for postoperative analysis and surgical training.

Abstract

Purpose: Accurate assessment of surgical complexity is essential in Laparoscopic Cholecystectomy (LC), where severe inflammation is associated with longer operative times and increased risk of postoperative complications. The Parkland Grading Scale (PGS) provides a clinically validated framework for stratifying inflammation severity; however, its automation in surgical videos remains largely unexplored, particularly in realistic scenarios where complete videos must be analyzed without prior manual curation. Methods: In this work, we introduce STC-Net, a novel framework for SingleTimestamp-based Complexity estimation in LC via the PGS, designed to operate under weak temporal supervision. Unlike prior methods limited to static images or manually trimmed clips, STC-Net operates directly on full videos. It jointly performs temporal localization and grading through a localization, window proposal, and grading module. We introduce a novel loss formulation combining hard and soft localization objectives and background-aware grading supervision. Results: Evaluated on a private dataset of 1,859 LC videos, STC-Net achieves an accuracy of 62.11% and an F1-score of 61.42%, outperforming non-localized baselines by over 10% in both metrics and highlighting the effectiveness of weak supervision for surgical complexity assessment. Conclusion: STC-Net demonstrates a scalable and effective approach for automated PGS-based surgical complexity estimation from full LC videos, making it promising for post-operative analysis and surgical training.

Paper Structure

This paper contains 11 sections, 5 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Examples of different PGS grades from our dataset. The screenshots correspond to the moment at which the PGS grade was assigned. GB stands for gallbladder.
  • Figure 2: Overview of STC-Net: A frozen CLIP encoder extracts frame features $X \in \mathbb{R}^{T \times D}$. The Localization Module (LM) predicts frame-wise probabilities $\hat{y}_P$, which the Window Proposal Module (WPM) converts into candidate windows $\mathcal{W}$. The Grading Module (GM) integrates $X$, $\hat{y}_P$, and $\mathcal{W}$ to predict the final grade $\hat{c}_{\text{pgs}}$.
  • Figure 3: Confusion matrices for Full, Trimmed (20s), and STC-Net (left to right) on the PGS classification task. Numbers in $[.]$ indicate the number of videos per class.
  • Figure 4: Qualitative comparison of frame-wise localization probabilities $\hat{y}_P$ under different supervision: from left to right, $\mathcal{L}_{\text{bce}}$, $\mathcal{L}_{\text{cos}}$, and $\mathcal{L}_{\text{bce}}+\mathcal{L}_{\text{cos}}$.