Table of Contents
Fetching ...

Geometry OR Tracker: Universal Geometric Operating Room Tracking

Yihua Shao, Kang Chen, Feng Xue, Siyu Chen, Long Bai, Hongyuan Yu, Hao Tang, Jinlin Wu, Nassir Navab

TL;DR

Geometry OR Tracker is introduced, a two-stage pipeline that first rectifies imprecise calibration into a scaleconsistent and geometrically consistent camera setup with a single global scale via a Multi-view Metric Geometry Rectification module, and then performs Occlusion-Robust 3D Point Tracking directly in the unified OR world frame.

Abstract

In operating rooms (OR), world-scale multi-view 3D tracking supports downstream applications such as surgeon behavior recognition, where physically meaningful quantities such as distances and motion statistics must be measured in meters. However, real clinical deployments rarely satisfy the geometric prerequisites for stable multi-view fusion and tracking: camera calibration and RGB-D registration are always unreliable, leading to cross-view geometric inconsistency that produces "ghosting" during fusion and degrades 3D trajectories in a shared OR coordinate frame. To address this, we introduce Geometry OR Tracker, a two-stage pipeline that first rectifies imprecise calibration into a scaleconsistent and geometrically consistent camera setup with a single global scale via a Multi-view Metric Geometry Rectification module, and then performs Occlusion-Robust 3D Point Tracking directly in the unified OR world frame. On the MM-OR benchmark, improved geometric consistency translates into tracking gains: our rectification front-end reduces cross-view depth disagreement by more than 30$\times$ compared to raw calibration. Ablation studies further demonstrate the relationship between calibration quality and tracking accuracy, showing that improved geometric consistency yields stronger world-frame tracking.

Geometry OR Tracker: Universal Geometric Operating Room Tracking

TL;DR

Geometry OR Tracker is introduced, a two-stage pipeline that first rectifies imprecise calibration into a scaleconsistent and geometrically consistent camera setup with a single global scale via a Multi-view Metric Geometry Rectification module, and then performs Occlusion-Robust 3D Point Tracking directly in the unified OR world frame.

Abstract

In operating rooms (OR), world-scale multi-view 3D tracking supports downstream applications such as surgeon behavior recognition, where physically meaningful quantities such as distances and motion statistics must be measured in meters. However, real clinical deployments rarely satisfy the geometric prerequisites for stable multi-view fusion and tracking: camera calibration and RGB-D registration are always unreliable, leading to cross-view geometric inconsistency that produces "ghosting" during fusion and degrades 3D trajectories in a shared OR coordinate frame. To address this, we introduce Geometry OR Tracker, a two-stage pipeline that first rectifies imprecise calibration into a scaleconsistent and geometrically consistent camera setup with a single global scale via a Multi-view Metric Geometry Rectification module, and then performs Occlusion-Robust 3D Point Tracking directly in the unified OR world frame. On the MM-OR benchmark, improved geometric consistency translates into tracking gains: our rectification front-end reduces cross-view depth disagreement by more than 30 compared to raw calibration. Ablation studies further demonstrate the relationship between calibration quality and tracking accuracy, showing that improved geometric consistency yields stronger world-frame tracking.
Paper Structure (10 sections, 7 equations, 2 figures, 3 tables)

This paper contains 10 sections, 7 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Pipeline of Geometry OR Tracker. Our method decouples geometry rectification from downstream tracking. Multi-view Metric Calibration Rectification produces a metric-consistent camera calibration and per-frame depth with a global metric scale; Occlusion-Robust Metric 3D Point Tracking then lifts multi-view observations into a fused 3D feature point cloud and tracks 3D query points in a shared room coordinate frame via local 3D neighborhood retrieval and iterative refinement.
  • Figure 2: 2D qualitative comparison of tracking between baselines and our method.