Table of Contents
Fetching ...

Intraoperative 2D/3D Image Registration via Differentiable X-ray Rendering

Vivek Gopalakrishnan, Neel Dey, Polina Golland

TL;DR

This work introduces DiffPose, a self-supervised framework for differentiable 2D/3D registration that aligns intraoperative X-rays to preoperative CT by learning a patient-specific pose encoder trained on synthetic X-rays and refining poses with a differentiable X-ray renderer. By performing optimization in the Lie algebra $\mathfrak{se}(3)$ and using a composite loss that combines geodesic and multiscale image similarity terms, DiffPose achieves sub-millimeter accuracy at clinically relevant speeds across datasets without manual labeling. The approach delivers strong improvements over both unsupervised and supervised baselines, demonstrating robust generalization to different surgical domains and suggesting a practical path toward reliable, label-free intraoperative guidance. Limitations include the current rigid-registration scope and per-patient pretraining time, with potential extensions to piecewise-rigid or deformable scenarios and faster initialization strategies.

Abstract

Surgical decisions are informed by aligning rapid portable 2D intraoperative images (e.g., X-rays) to a high-fidelity 3D preoperative reference scan (e.g., CT). 2D/3D image registration often fails in practice: conventional optimization methods are prohibitively slow and susceptible to local minima, while neural networks trained on small datasets fail on new patients or require impractical landmark supervision. We present DiffPose, a self-supervised approach that leverages patient-specific simulation and differentiable physics-based rendering to achieve accurate 2D/3D registration without relying on manually labeled data. Preoperatively, a CNN is trained to regress the pose of a randomly oriented synthetic X-ray rendered from the preoperative CT. The CNN then initializes rapid intraoperative test-time optimization that uses the differentiable X-ray renderer to refine the solution. Our work further proposes several geometrically principled methods for sampling camera poses from $\mathbf{SE}(3)$, for sparse differentiable rendering, and for driving registration in the tangent space $\mathfrak{se}(3)$ with geodesic and multiscale locality-sensitive losses. DiffPose achieves sub-millimeter accuracy across surgical datasets at intraoperative speeds, improving upon existing unsupervised methods by an order of magnitude and even outperforming supervised baselines. Our code is available at https://github.com/eigenvivek/DiffPose.

Intraoperative 2D/3D Image Registration via Differentiable X-ray Rendering

TL;DR

This work introduces DiffPose, a self-supervised framework for differentiable 2D/3D registration that aligns intraoperative X-rays to preoperative CT by learning a patient-specific pose encoder trained on synthetic X-rays and refining poses with a differentiable X-ray renderer. By performing optimization in the Lie algebra and using a composite loss that combines geodesic and multiscale image similarity terms, DiffPose achieves sub-millimeter accuracy at clinically relevant speeds across datasets without manual labeling. The approach delivers strong improvements over both unsupervised and supervised baselines, demonstrating robust generalization to different surgical domains and suggesting a practical path toward reliable, label-free intraoperative guidance. Limitations include the current rigid-registration scope and per-patient pretraining time, with potential extensions to piecewise-rigid or deformable scenarios and faster initialization strategies.

Abstract

Surgical decisions are informed by aligning rapid portable 2D intraoperative images (e.g., X-rays) to a high-fidelity 3D preoperative reference scan (e.g., CT). 2D/3D image registration often fails in practice: conventional optimization methods are prohibitively slow and susceptible to local minima, while neural networks trained on small datasets fail on new patients or require impractical landmark supervision. We present DiffPose, a self-supervised approach that leverages patient-specific simulation and differentiable physics-based rendering to achieve accurate 2D/3D registration without relying on manually labeled data. Preoperatively, a CNN is trained to regress the pose of a randomly oriented synthetic X-ray rendered from the preoperative CT. The CNN then initializes rapid intraoperative test-time optimization that uses the differentiable X-ray renderer to refine the solution. Our work further proposes several geometrically principled methods for sampling camera poses from , for sparse differentiable rendering, and for driving registration in the tangent space with geodesic and multiscale locality-sensitive losses. DiffPose achieves sub-millimeter accuracy across surgical datasets at intraoperative speeds, improving upon existing unsupervised methods by an order of magnitude and even outperforming supervised baselines. Our code is available at https://github.com/eigenvivek/DiffPose.
Paper Structure (29 sections, 20 equations, 11 figures, 5 tables)

This paper contains 29 sections, 20 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: DiffPose setup.Left: Camera poses are sampled via random perturbations from the isocenter pose $\mathbf T_\mathrm{iso}$. Right: An encoder is trained to regress the pose of a synthetic X-ray using a combination of image similarity and $\mathop{\mathrm{\mathbf{SE}(3)}}\nolimits3$-geodesic losses. At inference, the pose of a real intraoperative X-ray is estimated by the encoder and iteratively refined using test-time optimization with differentiable rendering.
  • Figure 2: Sample renders. Raw X-rays are preprocessed to match the image formation model in gopalakrishnan2022fast. Difference maps between intraoperative X-rays and renderings from a preoperative CT visualize domain shift between real and synthetic images. In Row 2, the left femur moves between acquisition of preoperative and intraoperative images; in Rows 3 and 4, 3D volumes do not capture the smallest cranial blood vessels luo2014low, so they cannot be rendered.
  • Figure 3: Quantitative evaluation. Evaluation of different registration methods on the DeepFluoro dataset via mTRE. A method successfully registered an X-ray if the final mTRE was less than one millimeter (red line). DiffPose is the only method that consistently achieves sub-millimeter mTRE, outperforming fully supervised methods (PoseNet and PnP-Regularizer). Note that the y-axis is on a log-scale.
  • Figure 4: Qualitative visualizations.Top: Renderings at the final pose estimates produced by different registration methods. Correspondences are drawn between true landmarks (blue) and estimated landmarks (orange). Bottom: To compare geometric alignment and not appearance, error maps are computed as the difference between the X-rays rendered at the ground truth pose and the final pose estimate.
  • Figure 5: External dataset validation. Using modeling decisions from a pelvic dataset, DiffPose demonstrates high registration accuracy on blood vessels in the brain. Top: Renders at final pose estimates with associated mTRE. Bottom: Error maps between X-rays rendered at the estimated and ground truth poses.
  • ...and 6 more figures