Table of Contents
Fetching ...

TSP-OCS: A Time-Series Prediction for Optimal Camera Selection in Multi-Viewpoint Surgical Video Analysis

Xinyu Liu, Xiaoguang Lin, Xiang Liu, Yong Yang, Hongqian Wang, Qilong Sun

TL;DR

A supervised time-series prediction framework to automatically select the most informative camera views, ensuring better coverage of critical steps in open thyroidectomy procedures and providing an initial exploration of multi-view camera selection for thyroidectomy.

Abstract

Recording the open surgery process is essential for educational and medical evaluation purposes; however, traditional single-camera methods often face challenges such as occlusions caused by the surgeon's head and body, as well as limitations due to fixed camera angles, which reduce comprehensibility of the video content. This study addresses these limitations by employing a multi-viewpoint camera recording system, capturing the surgical procedure from six different angles to mitigate occlusions. We propose a fully supervised learning-based time series prediction method to choose the best shot sequences from multiple simultaneously recorded video streams, ensuring optimal viewpoints at each moment. Our time series prediction model forecasts future camera selections by extracting and fusing visual and semantic features from surgical videos using pre-trained models. These features are processed by a temporal prediction network with TimeBlocks to capture sequential dependencies. A linear embedding layer reduces dimensionality, and a Softmax classifier selects the optimal camera view based on the highest probability. In our experiments, we created five groups of open thyroidectomy videos, each with simultaneous recordings from six different angles. The results demonstrate that our method achieves competitive accuracy compared to traditional supervised methods, even when predicting over longer time horizons. Furthermore, our approach outperforms state-of-the-art time series prediction techniques on our dataset. This manuscript makes a unique contribution by presenting an innovative framework that advances surgical video analysis techniques, with significant implications for improving surgical education and patient safety.

TSP-OCS: A Time-Series Prediction for Optimal Camera Selection in Multi-Viewpoint Surgical Video Analysis

TL;DR

A supervised time-series prediction framework to automatically select the most informative camera views, ensuring better coverage of critical steps in open thyroidectomy procedures and providing an initial exploration of multi-view camera selection for thyroidectomy.

Abstract

Recording the open surgery process is essential for educational and medical evaluation purposes; however, traditional single-camera methods often face challenges such as occlusions caused by the surgeon's head and body, as well as limitations due to fixed camera angles, which reduce comprehensibility of the video content. This study addresses these limitations by employing a multi-viewpoint camera recording system, capturing the surgical procedure from six different angles to mitigate occlusions. We propose a fully supervised learning-based time series prediction method to choose the best shot sequences from multiple simultaneously recorded video streams, ensuring optimal viewpoints at each moment. Our time series prediction model forecasts future camera selections by extracting and fusing visual and semantic features from surgical videos using pre-trained models. These features are processed by a temporal prediction network with TimeBlocks to capture sequential dependencies. A linear embedding layer reduces dimensionality, and a Softmax classifier selects the optimal camera view based on the highest probability. In our experiments, we created five groups of open thyroidectomy videos, each with simultaneous recordings from six different angles. The results demonstrate that our method achieves competitive accuracy compared to traditional supervised methods, even when predicting over longer time horizons. Furthermore, our approach outperforms state-of-the-art time series prediction techniques on our dataset. This manuscript makes a unique contribution by presenting an innovative framework that advances surgical video analysis techniques, with significant implications for improving surgical education and patient safety.

Paper Structure

This paper contains 17 sections, 5 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Multi-viewpoint cameras mounted on the shadowless surgical lamp allow the surgical procedure to be recorded simultaneously from six distinct perspectives. The shadowless lamp ensures consistent illumination, eliminating shadows in the surgical field and enabling each camera to capture critical procedural details precisely.
  • Figure 2: Annotation software interface: simultaneously displaying images from six different camera angles at the same time, allowing the annotator to select the best angle for annotation by clicking on the image.
  • Figure 3: The overall architecture of an end-to-end time-series prediction of multi-angle camera selection in open surgery: (a) Feature Extraction: A pre-trained ResNet-18 model is employed to extract visual features, while semantic features are extracted using the YOLOv5s model. These features are then integrated as inputs to the temporal prediction network. (b) Feature Transformation: Dimensionality reduction is performed on the high-dimensional feature vectors using a linear embedding layer, with temporal information incorporated to enhance the model's ability to capture sequential dependencies. (c) Optimal Viewpoint Prediction: The TimesBlock modules process the time-series data, with the Softmax classifier generating a probability distribution over possible camera labels, from which the optimal label is selected.