Table of Contents
Fetching ...

Rapid patient-specific neural networks for intraoperative X-ray to volume registration

Vivek Gopalakrishnan, Neel Dey, David-Dimitris Chlorogiannis, Andrew Abumoussa, Anna M. Larson, Darren B. Orbach, Sarah Frisken, Polina Golland

TL;DR

This work tackles the challenge of robust, fast 2D/3D registration between intraoperative X-ray images and preoperative volumes, a key bottleneck in image-guided interventions. The authors propose xvr, a self-supervised framework that trains patient-specific pose regression networks using synthetic X-rays generated from a patient’s own preoperative imaging via a differentiable X-ray renderer, followed by rapid gradient-based pose refinement. A major contribution is the amortized training strategy: pretrain a patient-agnostic model on diverse datasets and then finetune per patient in about five minutes, enabling practical use in emergencies and routine procedures. Extensive evaluation across pelvic, neurovascular, and skull cases from multiple hospitals demonstrates submillimeter accuracy and robust performance, with open-source release to facilitate broad adoption and further development.

Abstract

The integration of artificial intelligence in image-guided interventions holds transformative potential, promising to extract 3D geometric and quantitative information from conventional 2D imaging modalities during complex procedures. Achieving this requires the rapid and precise alignment of 2D intraoperative images (e.g., X-ray) with 3D preoperative volumes (e.g., CT, MRI). However, current 2D/3D registration methods fail across the broad spectrum of procedures dependent on X-ray guidance: traditional optimization techniques require custom parameter tuning for each subject, whereas neural networks trained on small datasets do not generalize to new patients or require labor-intensive manual annotations, increasing clinical burden and precluding application to new anatomical targets. To address these challenges, we present xvr, a fully automated framework for training patient-specific neural networks for 2D/3D registration. xvr uses physics-based simulation to generate abundant high-quality training data from a patient's own preoperative volumetric imaging, thereby overcoming the inherently limited ability of supervised models to generalize to new patients and procedures. Furthermore, xvr requires only 5 minutes of training per patient, making it suitable for emergency interventions as well as planned procedures. We perform the largest evaluation of a 2D/3D registration algorithm on real X-ray data to date and find that xvr robustly generalizes across a diverse dataset comprising multiple anatomical structures, imaging modalities, and hospitals. Across surgical tasks, xvr achieves submillimeter-accurate registration at intraoperative speeds, improving upon existing methods by an order of magnitude. xvr is released as open-source software freely available at https://github.com/eigenvivek/xvr.

Rapid patient-specific neural networks for intraoperative X-ray to volume registration

TL;DR

This work tackles the challenge of robust, fast 2D/3D registration between intraoperative X-ray images and preoperative volumes, a key bottleneck in image-guided interventions. The authors propose xvr, a self-supervised framework that trains patient-specific pose regression networks using synthetic X-rays generated from a patient’s own preoperative imaging via a differentiable X-ray renderer, followed by rapid gradient-based pose refinement. A major contribution is the amortized training strategy: pretrain a patient-agnostic model on diverse datasets and then finetune per patient in about five minutes, enabling practical use in emergencies and routine procedures. Extensive evaluation across pelvic, neurovascular, and skull cases from multiple hospitals demonstrates submillimeter accuracy and robust performance, with open-source release to facilitate broad adoption and further development.

Abstract

The integration of artificial intelligence in image-guided interventions holds transformative potential, promising to extract 3D geometric and quantitative information from conventional 2D imaging modalities during complex procedures. Achieving this requires the rapid and precise alignment of 2D intraoperative images (e.g., X-ray) with 3D preoperative volumes (e.g., CT, MRI). However, current 2D/3D registration methods fail across the broad spectrum of procedures dependent on X-ray guidance: traditional optimization techniques require custom parameter tuning for each subject, whereas neural networks trained on small datasets do not generalize to new patients or require labor-intensive manual annotations, increasing clinical burden and precluding application to new anatomical targets. To address these challenges, we present xvr, a fully automated framework for training patient-specific neural networks for 2D/3D registration. xvr uses physics-based simulation to generate abundant high-quality training data from a patient's own preoperative volumetric imaging, thereby overcoming the inherently limited ability of supervised models to generalize to new patients and procedures. Furthermore, xvr requires only 5 minutes of training per patient, making it suitable for emergency interventions as well as planned procedures. We perform the largest evaluation of a 2D/3D registration algorithm on real X-ray data to date and find that xvr robustly generalizes across a diverse dataset comprising multiple anatomical structures, imaging modalities, and hospitals. Across surgical tasks, xvr achieves submillimeter-accurate registration at intraoperative speeds, improving upon existing methods by an order of magnitude. xvr is released as open-source software freely available at https://github.com/eigenvivek/xvr.

Paper Structure

This paper contains 31 sections, 21 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Rapidly trained patient-specific neural networks with xvr achieve submillimeter accuracy in intraoperative 2D/3D registration without disrupting existing clinical workflows. (A) Preoperative 3D imaging (e.g., CT or MRI) is commonly acquired before many image-guided procedures. (B) Clinical teams make diagnoses and preoperative plans from these scans, which can take anywhere from minutes to multiple days depending on the intervention (e.g., stroke vs. radiotherapy). (C and D) During the preoperative phase, we train a patient-specific network to regress the ground truth (g.t.) pose of a synthetic training X-ray rendered from the patient's 3D imaging. These synthetic X-rays are generated using our differentiable X-ray renderer, which is designed to simulate the imaging physics and geometry of a C-arm. With xvr, patient-specific neural networks can be trained in as little as 5. (E) Intraoperatively, 3D volumes can no longer be acquired, and live 2D X-rays are used instead for guidance. (F) Trained networks are then deployed during interventions, performing accurate 2D/3D registration in seconds. This enables numerous applications for 3D-aware image guidance, such as the reprojection of 3D preoperative plans onto intraoperative imaging to highlight interventional targets or the identification of shared anatomical structures across multiple X-ray images of the patient using epipolar geometry.
  • Figure 2: xvr implements a physics-based differentiable renderer that simulates the geometry of an X-ray C-arm to generate photorealistic X-ray images from 3D volumes. (A) Our renderer requires two inputs: a 3D volume from which to generate synthetic X-rays and the pose of the C-arm (represented with a camera frustum). Our renderer is differentiable with respect to the C-arm pose, allowing us to use gradient-based optimization to register X-ray images to 3D volumes. (B) A pictorial overview of trilinear interpolation, one of the ray tracing methods we implement to render synthetic X-rays (along with Siddon's method siddon1985fast). (C) Optionally, a 3D label map of the preoperative volume can also be used to render X-rays of specific anatomical structures. (D) In addition to developing fully differentiable implementations of ray tracing with trilinear interpolation and Siddon's method, we also adapt these algorithms to project 3D anatomical labels into 2D space, enabling structure-specific registration. (E and F) Comparisons of real X-rays to synthetic images rendered from volumetric imaging of the same patients using successfully registered C-arm poses demonstrate the fidelity achievable with xvr.
  • Figure 3: Pretraining on publicly available datasets enables minutes-long patient-specific finetuning. (A) 3D renderings of pelvic CT scans from lower body cadavers in the DeepFluoro dataset (top). Volumes in the CTPelvick1K dataset are clinical scans of diverse hospitalized patients and contain findings not present in DeepFluoro, such as fractures and metal implants (bottom). (B) Maximum intensity projections (MIPs) of 3D rotational DSAs (rDSAs) from the Ljubljana dataset (top). Compared to MIPs of the TOF MRAs in the NITRC dataset, rDSAs typically capture a single hemisphere of circulation and do not contain any non-vessel anatomy (bottom). (C and D) After 12 of training on our synthetic X-ray task, patient-specific networks (blue) produce very accurate initial pose estimates (2040), while patient-agnostic networks trained for 48 (orange) have higher error (5080 for DeepFluoro and 90190 for Ljubljana). A finetuned model (pink) initialized from the patient-agnostic model matches the accuracy of the patient-specific model with only 5 of training. Error bars represent one standard deviation of pose estimation error averaged across the X-rays from all patients. (E and F) Renderings of synthetic X-rays from the pose predicted by the various models after 5 of neural network training (top). Only the finetuned model (pink) achieves acceptable error at this stage. The patient-agnostic (orange) and patient-specific (blue) models achieve comparable accuracy after 48 and 12 of training, respectively (bottom). Additionally, the effects of rigid registration over center-alignment when aligning the patient-specific volume to the pretraining dataset can be noted by comparing the patient-agnostic initial pose estimates at 48 between the DeepFluoro and Ljubljana examples. Note that ground truth and estimated fiducials are not used during pose estimation, but rather are used post hoc to visualize and quantify registration error.
  • Figure 4: Differentiable pose refinement achieves submillimeter registration accuracy. (A and B) Initial and final pose estimate errors for multiple initialization and iterative pose refinement strategies. Each method is annotated with the amount of neural network training time required, the percentage of X-rays that are successfully registered with less than 1 of error, and the renderer used to drive pose refinement. The bolded methods (patient-agnostic, patient-specific, and finetuned) are all part of xvr. (C) Our patient-specific neural networks achieve low initial pose estimation errors across all patients, whereas supervised methods exhibit high inter-subject variation and frequent out-of-distribution failures. (D and E) Survival curves of the final pose estimation error for various registration methods at multiple different success thresholds in DeepFluoro and Ljubljana, respectively. (F) Cumulative success rates for various registration rates quantified by the area under the survival curves demonstrate the superior performance of patient-specific models, whether trained from scratch or via finetuning. Finetuning via transfer learning is particularly important for Ljubljana as precise 3D/3D registration of patient-specific preoperative volumes to the pretraining dataset is more difficult for soft-tissue (vasculature) than bony structures (pelvic anatomy). (G) Initial pose estimates produced by the various pose estimation strategies for a particularly challenging intraoperative X-ray (top). The extreme cranial angle of this view is very far from a standard frontal view (Fixed Initialization). Therefore, such poses are severely underrepresented in the training set of real X-ray images, and thus, the supervised model (Landmark Initialization) suffers an out-of-distribution failure and predicts an implausible initial pose. In contrast, the patient-specific and finetuned models predict highly accurate initial pose estimates, which are quickly refined to yield a submillimeter accurate registration. Again, ground truth and estimated fiducial markers are only used for post hoc error visualization and error quantification, not during pose estimation.
  • Figure 5: xvr enables the rapid registration of large volumes of real-world clinical data. (A) A patient-agnostic pose estimation model was trained using synthetic X-rays rendered from 61 preregistered head CTs in the TotalSegmentator dataset. Using this model and iterative pose refinement, 122 intraoperative X-rays acquired from 50 neurosurgical patients at Brigham and Women's Hospital were registered to their corresponding preoperative 3D imaging. Registered C-arm poses from all 122 X-rays are visualized relative to the template head CT used for 3D preregistration. (B) Distributions of the estimated pose parameters reveal interesting clinical patterns, e.g., right anterior oblique (RAO) lateral X-rays are acquired 8$\times$ more frequently in this dataset than left anterior oblique (LAO) X-rays. (C--E) From manual evaluations by trained neuroradiologists and neurointerventionalists, registrations produced by xvr achieved a higher average success rate (96.2%) compared to registrations initialized from pose parameters in the DICOM header (30.5%). (C) xvr's neural network retains its accuracy even when tested on intraoperative images containing interventional findings, such as embolized vessels or craniotomies, which are not represented in the pretraining dataset. (D) Pose parameters provided in the DICOM header do not account for the motion of the patient relative to the C-arm, which often leads to insurmountably high initial pose estimation error. In contrast, xvr produces consistently accurate initial pose estimates, even for unconventional views. (E) Compared to CTs in benchmark datasets, clinical CTs sometimes contain smaller fields-of-view so as to limit radiation exposure. Even with this limitation, xvr can still register partial CT renders to full field-of-view X-rays. From these registrations, soft tissue findings encased within the skull's rigid structure (e.g., the location of a tumor or hemorrhage) can be reprojected from CT onto intraoperative X-rays for augmented image guidance.
  • ...and 3 more figures