Table of Contents
Fetching ...

Did you just see that? Arbitrary view synthesis for egocentric replay of operating room workflows from ambient sensors

Han Zhang, Lalithkumar Seenivasan, Jose L. Porras, Roger D. Soberanis-Mukul, Hao Ding, Hongchao Shu, Benjamin D. Killeen, Ankita Ghosh, Lonny Yarmus, Masaru Ishii, Angela Christine Argento, Mathias Unberath

TL;DR

EgoSurg is the first framework to reconstruct the dynamic, egocentric replays for any operating room (OR) staff directly from wall-mounted fixed-camera video, and thus, without intervention to clinical workflow, and establishes a new foundation for immersive surgical data science.

Abstract

Observing surgical practice has historically relied on fixed vantage points or recollections, leaving the egocentric visual perspectives that guide clinical decisions undocumented. Fixed-camera video can capture surgical workflows at the room-scale, but cannot reconstruct what each team member actually saw. Thus, these videos only provide limited insights into how decisions that affect surgical safety, training, and workflow optimization are made. Here we introduce EgoSurg, the first framework to reconstruct the dynamic, egocentric replays for any operating room (OR) staff directly from wall-mounted fixed-camera video, and thus, without intervention to clinical workflow. EgoSurg couples geometry-driven neural rendering with diffusion-based view enhancement, enabling high-visual fidelity synthesis of arbitrary and egocentric viewpoints at any moment. In evaluation across multi-site surgical cases and controlled studies, EgoSurg reconstructs person-specific visual fields and arbitrary viewpoints with high visual quality and fidelity. By transforming existing OR camera infrastructure into a navigable dynamic 3D record, EgoSurg establishes a new foundation for immersive surgical data science, enabling surgical practice to be visualized, experienced, and analyzed from every angle.

Did you just see that? Arbitrary view synthesis for egocentric replay of operating room workflows from ambient sensors

TL;DR

EgoSurg is the first framework to reconstruct the dynamic, egocentric replays for any operating room (OR) staff directly from wall-mounted fixed-camera video, and thus, without intervention to clinical workflow, and establishes a new foundation for immersive surgical data science.

Abstract

Observing surgical practice has historically relied on fixed vantage points or recollections, leaving the egocentric visual perspectives that guide clinical decisions undocumented. Fixed-camera video can capture surgical workflows at the room-scale, but cannot reconstruct what each team member actually saw. Thus, these videos only provide limited insights into how decisions that affect surgical safety, training, and workflow optimization are made. Here we introduce EgoSurg, the first framework to reconstruct the dynamic, egocentric replays for any operating room (OR) staff directly from wall-mounted fixed-camera video, and thus, without intervention to clinical workflow. EgoSurg couples geometry-driven neural rendering with diffusion-based view enhancement, enabling high-visual fidelity synthesis of arbitrary and egocentric viewpoints at any moment. In evaluation across multi-site surgical cases and controlled studies, EgoSurg reconstructs person-specific visual fields and arbitrary viewpoints with high visual quality and fidelity. By transforming existing OR camera infrastructure into a navigable dynamic 3D record, EgoSurg establishes a new foundation for immersive surgical data science, enabling surgical practice to be visualized, experienced, and analyzed from every angle.

Paper Structure

This paper contains 26 sections, 1 equation, 9 figures, 5 tables.

Figures (9)

  • Figure 1: EgoSurg enables arbitrary-perspective visualization of the OR. (a) Conventional fixed cameras capture only narrow and occlusion-prone views of the surgical workflow, leaving critical interactions invisible or ambiguous. (b) EgoSurg integrates video from sparse wall-mounted ambient cameras into a dynamic 3D scene, from which virtual viewpoints can be placed anywhere in the room or aligned with any team member. This perspective-agnostic framework overcomes the limitations of fixed or wearable cameras, enabling retrospective egocentric replays that faithfully reconstruct what each role could have seen at decisive moments.
  • Figure 2: Methodology and qualitative results of EgoSurg on real patient data. (a) Overview of the EgoSurg pipeline: Video from sparse ceiling-mounted cameras is integrated into a dynamic 3DGS representation, from which virtual egocentric viewpoints can be synthesized. (b) Reconstruction process: Stereo depth maps and camera calibration generate an initial sparse point cloud, which seeds a 3DGS representation that is continuously optimized with reference images. A diffusion model augments this process by generating auxiliary views, mitigating occlusions and enforcing cross-view consistency. (c) Example of diffusion-based repair: Corrupted or incomplete reference views are restored into semantically consistent frames, providing more complete supervision for reconstruction. (d) Qualitative results on robotic pulmonology procedures: Ambient camera views (top) and synthesized egocentric perspectives (bottom) across different surgical phases reveal fine-grained interactions and dynamic team coordination. (e) Novel-view image fidelity evaluation: EgoSurg substantially outperforms baselines (naïve 3DGS and depth reprojection) in PSNR and SSIM, demonstrating improved spatial coherence and temporal stability. (f) Egocentric accuracy: Synthesized views closely match ground-truth handheld recordings (left), with quantitative fidelity scores of PSNR $17.79\pm1.97$ and SSIM $0.766\pm0.026$ (right).
  • Figure 3: EgoSurg reveals sterile-field violations through egocentric reconstruction.(a) Four ambient ceiling-mounted cameras provide complementary but partially occluded views of the OR. (b) EgoSurg reconstructs a 3D scene with a simulated sterile field, where an arm makes contact with the boundary. A virtual observer can be placed to identify optimal vantage points, complementing human observations. (c) Egocentric replays from different team members, together with zoomed-in views of the contact region, unambiguously expose the violation that would otherwise remain hidden or anecdotal. This perspective-agnostic reconstruction converts ambiguous events into actionable safety evidence.
  • Figure 4: EgoSurg enables immersive training through role-specific egocentric perspectives. (a) Four ambient ceiling-mounted cameras capture a robotic pulmonology case during the endobronchial ultrasound phase. (b) The reconstructed 3D scene shows the surgical team gathered around a fluoroscopy monitor, with the surgeon performing the procedure and the nurse documenting the process. Virtual egocentric views reveal what each role could see at that moment. (c) Temporal replay of the surgeon’s trajectory: The top view illustrates movement paths within the OR, while synthesized egocentric views on the right capture evolving attention to instruments and monitors. Together, these reconstructions allow learners to experience critical moments from the perspective of different team members, supporting faster skill acquisition and shared situational awareness.
  • Figure 5: EgoSurg identifies workflow bottlenecks and enables counterfactual optimization. (a) Four ambient ceiling-mounted cameras capture a robotic pulmonology case under C-arm fluoroscopy. (b) 3D reconstruction of the OR shows multiple team members focused on the x-ray monitor; one participant’s line-of-sight (blue) is blocked, while a counterfactual repositioning 1 m left-posterior (light blue) restores visibility without obstructing others. (c) Egocentric replays comparing the original and optimized viewpoints demonstrate how EgoSurg converts anecdotal workflow inefficiencies into measurable, actionable evidence, supporting spatial reconfiguration to improve coordination.
  • ...and 4 more figures