Table of Contents
Fetching ...

Disturbance-Free Surgical Video Generation from Multi-Camera Shadowless Lamps for Open Surgery

Yuna Kato, Shohei Mori, Hideo Saito, Yoshifumi Takatsume, Hiroki Kajita, Mariko Isogawa

TL;DR

Open surgery video capture is hindered by occlusion; the authors automate frame alignment and camera switching across a multi-camera shadowless lamp (McSL) to produce a stable single-view video. The method combines automatic movement detection, robust multi-camera calibration, and view warping to minimize occlusion, plus optional centering and pixel filling to enhance visibility. Evaluations with surgeons show the auto-alignment approach outperforms manual and no-alignment baselines in misalignment reduction and viewer ease, with preferences for selectable visual aids. The work offers practical improvements for educational and research video quality in open surgery while outlining avenues for further robustness and artifact mitigation.

Abstract

Video recordings of open surgeries are greatly required for education and research purposes. However, capturing unobstructed videos is challenging since surgeons frequently block the camera field of view. To avoid occlusion, the positions and angles of the camera must be frequently adjusted, which is highly labor-intensive. Prior work has addressed this issue by installing multiple cameras on a shadowless lamp and arranging them to fully surround the surgical area. This setup increases the chances of some cameras capturing an unobstructed view. However, manual image alignment is needed in post-processing since camera configurations change every time surgeons move the lamp for optimal lighting. This paper aims to fully automate this alignment task. The proposed method identifies frames in which the lighting system moves, realigns them, and selects the camera with the least occlusion to generate a video that consistently presents the surgical field from a fixed perspective. A user study involving surgeons demonstrated that videos generated by our method were superior to those produced by conventional methods in terms of the ease of confirming the surgical area and the comfort during video viewing. Additionally, our approach showed improvements in video quality over existing techniques. Furthermore, we implemented several synthesis options for the proposed view-synthesis method and conducted a user study to assess surgeons' preferences for each option.

Disturbance-Free Surgical Video Generation from Multi-Camera Shadowless Lamps for Open Surgery

TL;DR

Open surgery video capture is hindered by occlusion; the authors automate frame alignment and camera switching across a multi-camera shadowless lamp (McSL) to produce a stable single-view video. The method combines automatic movement detection, robust multi-camera calibration, and view warping to minimize occlusion, plus optional centering and pixel filling to enhance visibility. Evaluations with surgeons show the auto-alignment approach outperforms manual and no-alignment baselines in misalignment reduction and viewer ease, with preferences for selectable visual aids. The work offers practical improvements for educational and research video quality in open surgery while outlining avenues for further robustness and artifact mitigation.

Abstract

Video recordings of open surgeries are greatly required for education and research purposes. However, capturing unobstructed videos is challenging since surgeons frequently block the camera field of view. To avoid occlusion, the positions and angles of the camera must be frequently adjusted, which is highly labor-intensive. Prior work has addressed this issue by installing multiple cameras on a shadowless lamp and arranging them to fully surround the surgical area. This setup increases the chances of some cameras capturing an unobstructed view. However, manual image alignment is needed in post-processing since camera configurations change every time surgeons move the lamp for optimal lighting. This paper aims to fully automate this alignment task. The proposed method identifies frames in which the lighting system moves, realigns them, and selects the camera with the least occlusion to generate a video that consistently presents the surgical field from a fixed perspective. A user study involving surgeons demonstrated that videos generated by our method were superior to those produced by conventional methods in terms of the ease of confirming the surgical area and the comfort during video viewing. Additionally, our approach showed improvements in video quality over existing techniques. Furthermore, we implemented several synthesis options for the proposed view-synthesis method and conducted a user study to assess surgeons' preferences for each option.

Paper Structure

This paper contains 12 sections, 6 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Multi-camera shadowless lamp (McSL). (a) The system comprises five units, each with six light sources and a camera (bottom). (b) The location and focus are adjusted to the surgical field by moving the central knob, causing camera configurations to change over time. Occluded light sources are compensated by others in conventional shadowless lamps, and so are cameras in McSL. (c) To avoid occlusions in output videos, a previous approach Shimizu switches an occluded camera to another with fewer occlusions. However, this results in rotated cameras over time. Our approach automatically aligns frames, centers the surgical field for stabilized outcomes, and detects frames upon motion for reconfigurations. We further apply visual aids to re-center the view and inpaint missing pixels.
  • Figure 2: Overview of the proposed method. $\mathbf{X}$, $\mathbf{X}'$, $\mathbf{Y}$, and $\mathbf{Y}'$ denote all input, aligned, stabilized, and enhanced video frames, respectively.
  • Figure 3: Robust McSL movement detection and homography estimation. (a) While the previous approach obayashi asks the video viewer to find consecutive video frames to perform McSL calibration, our algorithm can detect such video frames automatically. We vote for detected frames in each bin of $m$ intervals and take the median as $t_\text{mov}$ as the outcome (i.e., McSL moved at $t_\text{mov}$). (b) The figures show the superimposed images of five cameras at (left) before and (right) after McSL moves. The red circles highlight example feature points of a common scene point (i.e., a big toe) to calculate the degree of misalignment, $d_{\text{DOM}, t}$, or moving McSL. (c) We distinguish surgical fields and the others in hue to collect non-occluded frames for stable homography calculation. The red highlights represent the detected surgical fields (i.e., $S_t$).
  • Figure 4: Auto-alignment with respect to camera 1 over time. The entire surgery lasts 1:39:19, and the times = 0:16:59 and 0:36:50 in the figure are the frame IDs detected by camera movement detection, indicating the moment when realignment becomes necessary.
  • Figure 5: The procedure for combining two images during filling missing region. The foreground image is the video Y with missing pixels, and the background image is the video from the warp destination viewpoint (Ours is the Camera 1 viewpoint). First, regions in the foreground image with pixel values above 10 are extracted and blurred to generate an alpha mask. Using this mask, the foreground and background are blended to perform alpha blending at the boundaries. For pixels that remain missing after this process, the pixel values of the previous frame are blurred and retained.
  • ...and 4 more figures