Intraoperative 2D/3D Image Registration via Differentiable X-ray Rendering
Vivek Gopalakrishnan, Neel Dey, Polina Golland
TL;DR
This work introduces DiffPose, a self-supervised framework for differentiable 2D/3D registration that aligns intraoperative X-rays to preoperative CT by learning a patient-specific pose encoder trained on synthetic X-rays and refining poses with a differentiable X-ray renderer. By performing optimization in the Lie algebra $\mathfrak{se}(3)$ and using a composite loss that combines geodesic and multiscale image similarity terms, DiffPose achieves sub-millimeter accuracy at clinically relevant speeds across datasets without manual labeling. The approach delivers strong improvements over both unsupervised and supervised baselines, demonstrating robust generalization to different surgical domains and suggesting a practical path toward reliable, label-free intraoperative guidance. Limitations include the current rigid-registration scope and per-patient pretraining time, with potential extensions to piecewise-rigid or deformable scenarios and faster initialization strategies.
Abstract
Surgical decisions are informed by aligning rapid portable 2D intraoperative images (e.g., X-rays) to a high-fidelity 3D preoperative reference scan (e.g., CT). 2D/3D image registration often fails in practice: conventional optimization methods are prohibitively slow and susceptible to local minima, while neural networks trained on small datasets fail on new patients or require impractical landmark supervision. We present DiffPose, a self-supervised approach that leverages patient-specific simulation and differentiable physics-based rendering to achieve accurate 2D/3D registration without relying on manually labeled data. Preoperatively, a CNN is trained to regress the pose of a randomly oriented synthetic X-ray rendered from the preoperative CT. The CNN then initializes rapid intraoperative test-time optimization that uses the differentiable X-ray renderer to refine the solution. Our work further proposes several geometrically principled methods for sampling camera poses from $\mathbf{SE}(3)$, for sparse differentiable rendering, and for driving registration in the tangent space $\mathfrak{se}(3)$ with geodesic and multiscale locality-sensitive losses. DiffPose achieves sub-millimeter accuracy across surgical datasets at intraoperative speeds, improving upon existing unsupervised methods by an order of magnitude and even outperforming supervised baselines. Our code is available at https://github.com/eigenvivek/DiffPose.
