CoStoDet-DDPM: Collaborative Training of Stochastic and Deterministic Models Improves Surgical Workflow Anticipation and Recognition
Kaixiang Yang, Xin Li, Qiang Li, Zhiwei Wang
TL;DR
CoStoDet-DDPM introduces a collaborative training framework that blends a deterministic surgical workflow predictor with a stochastic DDPM branch to address variability across patients and procedures. By training both branches jointly and discarding the DDPM at inference, the method achieves real-time performance while leveraging stochastic representations to improve anticipation and recognition accuracy. The approach yields state-of-the-art results on Cholec80 and AutoLaparo, with substantial gains in anticipation error and Jaccard for phase recognition, and robust generalization to patient-specific variations. Ablation studies confirm the complementary roles of the two branches and offer practical insights into encoder, temporal span, and diffusion configurations. This work demonstrates that diffusion-based feature enhancement can boost clinical predictive reliability without sacrificing speed, enabling safer and more efficient intraoperative assistance.
Abstract
Anticipating and recognizing surgical workflows are critical for intelligent surgical assistance systems. However, existing methods rely on deterministic decision-making, struggling to generalize across the large anatomical and procedural variations inherent in real-world surgeries.In this paper, we introduce an innovative framework that incorporates stochastic modeling through a denoising diffusion probabilistic model (DDPM) into conventional deterministic learning for surgical workflow analysis. At the heart of our approach is a collaborative co-training paradigm: the DDPM branch captures procedural uncertainties to enrich feature representations, while the task branch focuses on predicting surgical phases and instrument usage.Theoretically, we demonstrate that this mutual refinement mechanism benefits both branches: the DDPM reduces prediction errors in uncertain scenarios, and the task branch directs the DDPM toward clinically meaningful representations. Notably, the DDPM branch is discarded during inference, enabling real-time predictions without sacrificing accuracy.Experiments on the Cholec80 dataset show that for the anticipation task, our method achieves a 16% reduction in eMAE compared to state-of-the-art approaches, and for phase recognition, it improves the Jaccard score by 1.0%. Additionally, on the AutoLaparo dataset, our method achieves a 1.5% improvement in the Jaccard score for phase recognition, while also exhibiting robust generalization to patient-specific variations. Our code and weight are available at https://github.com/kk42yy/CoStoDet-DDPM.
