Table of Contents
Fetching ...

EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos

Ryo Fujii, Masashi Hatano, Hideo Saito, Hiroki Kajita

TL;DR

A gaze-guided masked autoencoder (GGMAE), inspired by the notable success of masked autoencoders (MAEs) in video understanding tasks, which significantly improves the previous state-of-the-art recognition method and the masked autoencoder-based method on EgoSurgery-Phase.

Abstract

Surgical phase recognition has gained significant attention due to its potential to offer solutions to numerous demands of the modern operating room. However, most existing methods concentrate on minimally invasive surgery (MIS), leaving surgical phase recognition for open surgery understudied. This discrepancy is primarily attributed to the scarcity of publicly available open surgery video datasets for surgical phase recognition. To address this issue, we introduce a new egocentric open surgery video dataset for phase recognition, named EgoSurgery-Phase. This dataset comprises 15 hours of real open surgery videos spanning 9 distinct surgical phases all captured using an egocentric camera attached to the surgeon's head. In addition to video, the EgoSurgery-Phase offers eye gaze. As far as we know, it is the first real open surgery video dataset for surgical phase recognition publicly available. Furthermore, inspired by the notable success of masked autoencoders (MAEs) in video understanding tasks (e.g., action recognition), we propose a gaze-guided masked autoencoder (GGMAE). Considering the regions where surgeons' gaze focuses are often critical for surgical phase recognition (e.g., surgical field), in our GGMAE, the gaze information acts as an empirical semantic richness prior to guiding the masking process, promoting better attention to semantically rich spatial regions. GGMAE significantly improves the previous state-of-the-art recognition method (6.4% in Jaccard) and the masked autoencoder-based method (3.1% in Jaccard) on EgoSurgery-Phase. The dataset is released at https://github.com/Fujiry0/EgoSurgery.

EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos

TL;DR

A gaze-guided masked autoencoder (GGMAE), inspired by the notable success of masked autoencoders (MAEs) in video understanding tasks, which significantly improves the previous state-of-the-art recognition method and the masked autoencoder-based method on EgoSurgery-Phase.

Abstract

Surgical phase recognition has gained significant attention due to its potential to offer solutions to numerous demands of the modern operating room. However, most existing methods concentrate on minimally invasive surgery (MIS), leaving surgical phase recognition for open surgery understudied. This discrepancy is primarily attributed to the scarcity of publicly available open surgery video datasets for surgical phase recognition. To address this issue, we introduce a new egocentric open surgery video dataset for phase recognition, named EgoSurgery-Phase. This dataset comprises 15 hours of real open surgery videos spanning 9 distinct surgical phases all captured using an egocentric camera attached to the surgeon's head. In addition to video, the EgoSurgery-Phase offers eye gaze. As far as we know, it is the first real open surgery video dataset for surgical phase recognition publicly available. Furthermore, inspired by the notable success of masked autoencoders (MAEs) in video understanding tasks (e.g., action recognition), we propose a gaze-guided masked autoencoder (GGMAE). Considering the regions where surgeons' gaze focuses are often critical for surgical phase recognition (e.g., surgical field), in our GGMAE, the gaze information acts as an empirical semantic richness prior to guiding the masking process, promoting better attention to semantically rich spatial regions. GGMAE significantly improves the previous state-of-the-art recognition method (6.4% in Jaccard) and the masked autoencoder-based method (3.1% in Jaccard) on EgoSurgery-Phase. The dataset is released at https://github.com/Fujiry0/EgoSurgery.
Paper Structure (16 sections, 3 equations, 4 figures, 3 tables)

This paper contains 16 sections, 3 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Illustration of 9 surgical phases (P1-P9) annotated in the EgoSurgery-Phase dataset. Typically, the phases are executed sequentially from P1 to P9.
  • Figure 2: Example of RGB image and gaze heatmap from EgoSurgery-Phase, along with their corresponding random mask and gaze-guided mask. The gaze heatmap is depicted as a heatmap overlaid onto the RGB image for visualization purposes.
  • Figure 3: The phase distribution of frames.
  • Figure 4: Overview of the proposed GGMAE: GGME performs the task of masking tokens and reconstructing these masked tokens with Transformer encoder-decoder architecture. Considering that open surgery videos often contain non-informative regions, we introduce the Gaze-Guided Masking (GGM) module, which selects tokens to be masked based on gaze information.