Table of Contents
Fetching ...

Hypergraph-Transformer (HGT) for Interactive Event Prediction in Laparoscopic and Robotic Surgery

Lianhao Yin, Yutong Ban, Jennifer Eckhoff, Ozanan Meireles, Daniela Rus, Guy Rosman

TL;DR

This work introduces a Hypergraph Transformer (HGT) to forecast interactive events in laparoscopic and robotic surgery from endoscopic video, by encoding surgical knowledge graphs into a graph-based encoder–decoder and predicting future embeddings. The model targets three clinically relevant tasks: action-triplet detection/prediction, Critical View of Safety (CVS) progression, and clipping actions before CVS, demonstrating state-of-the-art performance on public datasets. Key innovations include a hypergraph representation of surgical concepts, time-aware graph message passing, and a transformer-based predictor that handles sparse, high-stakes predictions with a two-phase training scheme and importance sampling. The results suggest the approach can provide timely, safety-relevant warnings and guidance to surgeons, enabling safer, AI-assisted intraoperative decision-making, though they acknowledge data scarcity and labeling variability as ongoing challenges.

Abstract

Understanding and anticipating intraoperative events and actions is critical for intraoperative assistance and decision-making during minimally invasive surgery. Automated prediction of events, actions, and the following consequences is addressed through various computational approaches with the objective of augmenting surgeons' perception and decision-making capabilities. We propose a predictive neural network that is capable of understanding and predicting critical interactive aspects of surgical workflow from intra-abdominal video, while flexibly leveraging surgical knowledge graphs. The approach incorporates a hypergraph-transformer (HGT) structure that encodes expert knowledge into the network design and predicts the hidden embedding of the graph. We verify our approach on established surgical datasets and applications, including the detection and prediction of action triplets, and the achievement of the Critical View of Safety (CVS). Moreover, we address specific, safety-related tasks, such as predicting the clipping of cystic duct or artery without prior achievement of the CVS. Our results demonstrate the superiority of our approach compared to unstructured alternatives.

Hypergraph-Transformer (HGT) for Interactive Event Prediction in Laparoscopic and Robotic Surgery

TL;DR

This work introduces a Hypergraph Transformer (HGT) to forecast interactive events in laparoscopic and robotic surgery from endoscopic video, by encoding surgical knowledge graphs into a graph-based encoder–decoder and predicting future embeddings. The model targets three clinically relevant tasks: action-triplet detection/prediction, Critical View of Safety (CVS) progression, and clipping actions before CVS, demonstrating state-of-the-art performance on public datasets. Key innovations include a hypergraph representation of surgical concepts, time-aware graph message passing, and a transformer-based predictor that handles sparse, high-stakes predictions with a two-phase training scheme and importance sampling. The results suggest the approach can provide timely, safety-relevant warnings and guidance to surgeons, enabling safer, AI-assisted intraoperative decision-making, though they acknowledge data scarcity and labeling variability as ongoing challenges.

Abstract

Understanding and anticipating intraoperative events and actions is critical for intraoperative assistance and decision-making during minimally invasive surgery. Automated prediction of events, actions, and the following consequences is addressed through various computational approaches with the objective of augmenting surgeons' perception and decision-making capabilities. We propose a predictive neural network that is capable of understanding and predicting critical interactive aspects of surgical workflow from intra-abdominal video, while flexibly leveraging surgical knowledge graphs. The approach incorporates a hypergraph-transformer (HGT) structure that encodes expert knowledge into the network design and predicts the hidden embedding of the graph. We verify our approach on established surgical datasets and applications, including the detection and prediction of action triplets, and the achievement of the Critical View of Safety (CVS). Moreover, we address specific, safety-related tasks, such as predicting the clipping of cystic duct or artery without prior achievement of the CVS. Our results demonstrate the superiority of our approach compared to unstructured alternatives.
Paper Structure (24 sections, 6 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 24 sections, 6 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: We propose an event prediction framework during surgery based on a Hypergraph Transformer (HGT). The model makes predictions of action-triplets, and surgical safety measures based on recent past video frames.
  • Figure 2: Example of edge and nodes. The subgraph on the right captures the relevant concepts and a hyperedge for the video frame on the right Roy2020-is.
  • Figure 3: The architecture of the proposed Hypergraph-transformer. It is to make predictions of action sequences by using past video frames. The encoder encodes the past video frames and the decoder makes predictions. A pre-trained DINO was used as the visual backbone. The encoder and decoder use hypergraph message passing for every graph element separately.
  • Figure 4: Subgraphs for different prediction tasks: a) action-triplets. b) CVS achievement. c) Clipping without prior CVS. d) Clipping.