Table of Contents
Fetching ...

Investigating Robot Control Policy Learning for Autonomous X-ray-guided Spine Procedures

Florence Klitzner, Blanca Inigo, Benjamin D. Killeen, Lalithkumar Seenivasan, Michelle Song, Axel Krieger, Mathias Unberath

TL;DR

This work evaluates whether imitation-policy learning can drive planning and open-loop control for bi-plane X-ray guided spine cannula insertion using only fluoroscopic input. A high-fidelity in silico sandbox and a dataset of surgeon-like trajectories paired with bi-plane X-ray sequences enable transformer-based policies to predict incremental cannula adjustments in the cannula's local frame. Results show strong performance in simulated cases with a first-pass acceptance of $68.5\%$ on held-out synthetic data, $49.2\%$ on fractured anatomy, and partial transfer to real X-ray images at $34.8\%$, highlighting both the promise and current limitations of CT-free, vision-based spinal navigation. The analysis identifies entry-point localization as a primary bottleneck and argues that higher-frequency feedback will be essential for achieving closed-loop control in real-world deployments. Overall, the study establishes a foundation for lightweight, CT-free robotic intra-operative spinal navigation while delineating clear directions for improvement through priors and domain adaptation.

Abstract

Imitation learning-based robot control policies are enjoying renewed interest in video-based robotics. However, it remains unclear whether this approach applies to X-ray-guided procedures, such as spine instrumentation. This is because interpretation of multi-view X-rays is complex. We examine opportunities and challenges for imitation policy learning in bi-plane-guided cannula insertion. We develop an in silico sandbox for scalable, automated simulation of X-ray-guided spine procedures with a high degree of realism. We curate a dataset of correct trajectories and corresponding bi-planar X-ray sequences that emulate the stepwise alignment of providers. We then train imitation learning policies for planning and open-loop control that iteratively align a cannula solely based on visual information. This precisely controlled setup offers insights into limitations and capabilities of this method. Our policy succeeded on the first attempt in 68.5% of cases, maintaining safe intra-pedicular trajectories across diverse vertebral levels. The policy generalized to complex anatomy, including fractures, and remained robust to varied initializations. Rollouts on real bi-planar X-rays further suggest that the model can produce plausible trajectories, despite training exclusively in simulation. While these preliminary results are promising, we also identify limitations, especially in entry point precision. Full closed-look control will require additional considerations around how to provide sufficiently frequent feedback. With more robust priors and domain knowledge, such models may provide a foundation for future efforts toward lightweight and CT-free robotic intra-operative spinal navigation.

Investigating Robot Control Policy Learning for Autonomous X-ray-guided Spine Procedures

TL;DR

This work evaluates whether imitation-policy learning can drive planning and open-loop control for bi-plane X-ray guided spine cannula insertion using only fluoroscopic input. A high-fidelity in silico sandbox and a dataset of surgeon-like trajectories paired with bi-plane X-ray sequences enable transformer-based policies to predict incremental cannula adjustments in the cannula's local frame. Results show strong performance in simulated cases with a first-pass acceptance of on held-out synthetic data, on fractured anatomy, and partial transfer to real X-ray images at , highlighting both the promise and current limitations of CT-free, vision-based spinal navigation. The analysis identifies entry-point localization as a primary bottleneck and argues that higher-frequency feedback will be essential for achieving closed-loop control in real-world deployments. Overall, the study establishes a foundation for lightweight, CT-free robotic intra-operative spinal navigation while delineating clear directions for improvement through priors and domain adaptation.

Abstract

Imitation learning-based robot control policies are enjoying renewed interest in video-based robotics. However, it remains unclear whether this approach applies to X-ray-guided procedures, such as spine instrumentation. This is because interpretation of multi-view X-rays is complex. We examine opportunities and challenges for imitation policy learning in bi-plane-guided cannula insertion. We develop an in silico sandbox for scalable, automated simulation of X-ray-guided spine procedures with a high degree of realism. We curate a dataset of correct trajectories and corresponding bi-planar X-ray sequences that emulate the stepwise alignment of providers. We then train imitation learning policies for planning and open-loop control that iteratively align a cannula solely based on visual information. This precisely controlled setup offers insights into limitations and capabilities of this method. Our policy succeeded on the first attempt in 68.5% of cases, maintaining safe intra-pedicular trajectories across diverse vertebral levels. The policy generalized to complex anatomy, including fractures, and remained robust to varied initializations. Rollouts on real bi-planar X-rays further suggest that the model can produce plausible trajectories, despite training exclusively in simulation. While these preliminary results are promising, we also identify limitations, especially in entry point precision. Full closed-look control will require additional considerations around how to provide sufficiently frequent feedback. With more robust priors and domain knowledge, such models may provide a foundation for future efforts toward lightweight and CT-free robotic intra-operative spinal navigation.

Paper Structure

This paper contains 17 sections, 9 figures, 1 table.

Figures (9)

  • Figure 1: Model Overview. Left to right: Inputs consisting of current AP and lateral observations are processed via a conditional variational autoencoder. Fine-grain pose adjustments for cannula are predicted as actions to generate final insertion trajectory while modeling surgeon like adjustments.
  • Figure 2: Data Generation Pipeline. Left to right: CT-scans from the NMDID edgar2020_NMDID dataset are preprocessed using TotalSegmentator wasserthal_totalsegmentator_2023. Then representative Statistical Shape Models are extracted, manually annotated and propagated over multiple CT scans. Lastly annotations are simulated via DeepDRR to generate our training data.
  • Figure 3: Example training episode from the NMDID dataset. Visualization of safe insertion into vertebra T12 via the left pedicle at multiple timesteps. Top: anterior-posterior (AP) view; bottom: lateral view, with post-processing applied. Red rectangles mark cropped regions around the target vertebra. Coloured dots indicate the remaining trajectory as projected cannula tip positions.
  • Figure 4: Safety grading on unseen episodes from NMDID dataset. Left: Grade distribution of breaches (A: no breach, B: $\le$2 mm, C: 2–4 mm, D: 4–6 mm, E: $\ge$6 mm or extra-pedicular). Right: episodes per grade for each vertebral level. Acceptance is defined as Grades A+B, totaling 73.4%.
  • Figure 5: Geometric agreement with safe generated plans. Top: entry-point distance (mm, left) and angular offset (degrees, right) grouped by vertebral level. Bottom: the same metrics grouped by pedicle side. Boxes show the inter-quartile range (IQR) with the median line; whiskers denote 1.5$\times$IQR; points are outliers.
  • ...and 4 more figures