Table of Contents
Fetching ...

Optimizing Latent Graph Representations of Surgical Scenes for Zero-Shot Domain Transfer

Siddhant Satyanaik, Aditya Murali, Deepak Alapatt, Xin Wang, Pietro Mascagni, Nicolas Padoy

TL;DR

This work investigates the use of object-centric methods for unseen domain generalization, identifies method-agnostic factors critical for performance, and presents an optimized approach that substantially outperforms existing methods.

Abstract

Purpose: Advances in deep learning have resulted in effective models for surgical video analysis; however, these models often fail to generalize across medical centers due to domain shift caused by variations in surgical workflow, camera setups, and patient demographics. Recently, object-centric learning has emerged as a promising approach for improved surgical scene understanding, capturing and disentangling visual and semantic properties of surgical tools and anatomy to improve downstream task performance. In this work, we conduct a multi-centric performance benchmark of object-centric approaches, focusing on Critical View of Safety assessment in laparoscopic cholecystectomy, then propose an improved approach for unseen domain generalization. Methods: We evaluate four object-centric approaches for domain generalization, establishing baseline performance. Next, leveraging the disentangled nature of object-centric representations, we dissect one of these methods through a series of ablations (e.g. ignoring either visual or semantic features for downstream classification). Finally, based on the results of these ablations, we develop an optimized method specifically tailored for domain generalization, LG-DG, that includes a novel disentanglement loss function. Results: Our optimized approach, LG-DG, achieves an improvement of 9.28% over the best baseline approach. More broadly, we show that object-centric approaches are highly effective for domain generalization thanks to their modular approach to representation learning. Conclusion: We investigate the use of object-centric methods for unseen domain generalization, identify method-agnostic factors critical for performance, and present an optimized approach that substantially outperforms existing methods.

Optimizing Latent Graph Representations of Surgical Scenes for Zero-Shot Domain Transfer

TL;DR

This work investigates the use of object-centric methods for unseen domain generalization, identifies method-agnostic factors critical for performance, and presents an optimized approach that substantially outperforms existing methods.

Abstract

Purpose: Advances in deep learning have resulted in effective models for surgical video analysis; however, these models often fail to generalize across medical centers due to domain shift caused by variations in surgical workflow, camera setups, and patient demographics. Recently, object-centric learning has emerged as a promising approach for improved surgical scene understanding, capturing and disentangling visual and semantic properties of surgical tools and anatomy to improve downstream task performance. In this work, we conduct a multi-centric performance benchmark of object-centric approaches, focusing on Critical View of Safety assessment in laparoscopic cholecystectomy, then propose an improved approach for unseen domain generalization. Methods: We evaluate four object-centric approaches for domain generalization, establishing baseline performance. Next, leveraging the disentangled nature of object-centric representations, we dissect one of these methods through a series of ablations (e.g. ignoring either visual or semantic features for downstream classification). Finally, based on the results of these ablations, we develop an optimized method specifically tailored for domain generalization, LG-DG, that includes a novel disentanglement loss function. Results: Our optimized approach, LG-DG, achieves an improvement of 9.28% over the best baseline approach. More broadly, we show that object-centric approaches are highly effective for domain generalization thanks to their modular approach to representation learning. Conclusion: We investigate the use of object-centric methods for unseen domain generalization, identify method-agnostic factors critical for performance, and present an optimized approach that substantially outperforms existing methods.
Paper Structure (15 sections, 5 equations, 3 figures, 6 tables)

This paper contains 15 sections, 5 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Qualitative examples to illustrate the visual domain gap between Endoscapes2023 and Endoscapes-WC70. The images differ in aspect ratio, color distribution, and field of view that could be caused by variations in the laparoscope used and in surgical workflow.
  • Figure 2: An illustration of a generic Object-Centric method.
  • Figure 3: Three different examples of the masked latent graph $\hat{G}_{\text{CVS}}$ that is passed to the downstream classification head $\phi_{\text{CVS}}$ for CVS prediction. Each node in the graph corresponds to an object in the image. The masking operation, while pictured for a single node, is applied to all nodes.