Hypergraph-Transformer (HGT) for Interactive Event Prediction in Laparoscopic and Robotic Surgery
Lianhao Yin, Yutong Ban, Jennifer Eckhoff, Ozanan Meireles, Daniela Rus, Guy Rosman
TL;DR
This work introduces a Hypergraph Transformer (HGT) to forecast interactive events in laparoscopic and robotic surgery from endoscopic video, by encoding surgical knowledge graphs into a graph-based encoder–decoder and predicting future embeddings. The model targets three clinically relevant tasks: action-triplet detection/prediction, Critical View of Safety (CVS) progression, and clipping actions before CVS, demonstrating state-of-the-art performance on public datasets. Key innovations include a hypergraph representation of surgical concepts, time-aware graph message passing, and a transformer-based predictor that handles sparse, high-stakes predictions with a two-phase training scheme and importance sampling. The results suggest the approach can provide timely, safety-relevant warnings and guidance to surgeons, enabling safer, AI-assisted intraoperative decision-making, though they acknowledge data scarcity and labeling variability as ongoing challenges.
Abstract
Understanding and anticipating intraoperative events and actions is critical for intraoperative assistance and decision-making during minimally invasive surgery. Automated prediction of events, actions, and the following consequences is addressed through various computational approaches with the objective of augmenting surgeons' perception and decision-making capabilities. We propose a predictive neural network that is capable of understanding and predicting critical interactive aspects of surgical workflow from intra-abdominal video, while flexibly leveraging surgical knowledge graphs. The approach incorporates a hypergraph-transformer (HGT) structure that encodes expert knowledge into the network design and predicts the hidden embedding of the graph. We verify our approach on established surgical datasets and applications, including the detection and prediction of action triplets, and the achievement of the Critical View of Safety (CVS). Moreover, we address specific, safety-related tasks, such as predicting the clipping of cystic duct or artery without prior achievement of the CVS. Our results demonstrate the superiority of our approach compared to unstructured alternatives.
