Table of Contents
Fetching ...

X2BR: High-Fidelity 3D Bone Reconstruction from a Planar X-Ray Image with Hybrid Neural Implicit Methods

Gokce Guven, H. Fatih Ugurdag, Hasan F. Ates

TL;DR

This work tackles single-view 3D bone reconstruction from planar X-rays by introducing X2B, a ConvNeXt-based occupancy network that reconstructs high-fidelity skeletal volumes without priors, and X2BR, a template-guided refinement that enforces anatomical plausibility via a biomechanical template and GBCPD++ non-rigid registration. X2B achieves state-of-the-art numerical accuracy (IoU around 0.95; Chamfer-L1 around 0.005), while X2BR improves anatomical realism, yielding better rib curvature and vertebral alignment through template alignment despite slightly lower IoU. The approach leverages a large real-patient dataset of 3D bone meshes paired with digitally reconstructed radiographs, uses DRR inputs for training, and integrates a full pipeline from segmentation (TotalSegmentator) to occupancy calculation, MISE-based mesh extraction, and biomechanically informed registration. Overall, X2B/X2BR demonstrate a robust hybrid framework that balances quantitative accuracy with clinical interpretability, enabling applications in surgical planning and patient-specific biomechanical simulations.

Abstract

Accurate 3D bone reconstruction from a single planar X-ray remains a challenge due to anatomical complexity and limited input data. We propose X2BR, a hybrid neural implicit framework that combines continuous volumetric reconstruction with template-guided non-rigid registration. The core network, X2B, employs a ConvNeXt-based encoder to extract spatial features from X-rays and predict high-fidelity 3D bone occupancy fields without relying on statistical shape models. To further refine anatomical accuracy, X2BR integrates a patient-specific template mesh, constructed using YOLOv9-based detection and the SKEL biomechanical skeleton model. The coarse reconstruction is aligned to the template using geodesic-based coherent point drift, enabling anatomically consistent 3D bone volumes. Experimental results on a clinical dataset show that X2B achieves the highest numerical accuracy, with an IoU of 0.952 and Chamfer-L1 distance of 0.005, outperforming recent baselines including X2V and D2IM-Net. Building on this, X2BR incorporates anatomical priors via YOLOv9-based bone detection and biomechanical template alignment, leading to reconstructions that, while slightly lower in IoU (0.875), offer superior anatomical realism, especially in rib curvature and vertebral alignment. This numerical accuracy vs. visual consistency trade-off between X2B and X2BR highlights the value of hybrid frameworks for clinically relevant 3D reconstructions.

X2BR: High-Fidelity 3D Bone Reconstruction from a Planar X-Ray Image with Hybrid Neural Implicit Methods

TL;DR

This work tackles single-view 3D bone reconstruction from planar X-rays by introducing X2B, a ConvNeXt-based occupancy network that reconstructs high-fidelity skeletal volumes without priors, and X2BR, a template-guided refinement that enforces anatomical plausibility via a biomechanical template and GBCPD++ non-rigid registration. X2B achieves state-of-the-art numerical accuracy (IoU around 0.95; Chamfer-L1 around 0.005), while X2BR improves anatomical realism, yielding better rib curvature and vertebral alignment through template alignment despite slightly lower IoU. The approach leverages a large real-patient dataset of 3D bone meshes paired with digitally reconstructed radiographs, uses DRR inputs for training, and integrates a full pipeline from segmentation (TotalSegmentator) to occupancy calculation, MISE-based mesh extraction, and biomechanically informed registration. Overall, X2B/X2BR demonstrate a robust hybrid framework that balances quantitative accuracy with clinical interpretability, enabling applications in surgical planning and patient-specific biomechanical simulations.

Abstract

Accurate 3D bone reconstruction from a single planar X-ray remains a challenge due to anatomical complexity and limited input data. We propose X2BR, a hybrid neural implicit framework that combines continuous volumetric reconstruction with template-guided non-rigid registration. The core network, X2B, employs a ConvNeXt-based encoder to extract spatial features from X-rays and predict high-fidelity 3D bone occupancy fields without relying on statistical shape models. To further refine anatomical accuracy, X2BR integrates a patient-specific template mesh, constructed using YOLOv9-based detection and the SKEL biomechanical skeleton model. The coarse reconstruction is aligned to the template using geodesic-based coherent point drift, enabling anatomically consistent 3D bone volumes. Experimental results on a clinical dataset show that X2B achieves the highest numerical accuracy, with an IoU of 0.952 and Chamfer-L1 distance of 0.005, outperforming recent baselines including X2V and D2IM-Net. Building on this, X2BR incorporates anatomical priors via YOLOv9-based bone detection and biomechanical template alignment, leading to reconstructions that, while slightly lower in IoU (0.875), offer superior anatomical realism, especially in rib curvature and vertebral alignment. This numerical accuracy vs. visual consistency trade-off between X2B and X2BR highlights the value of hybrid frameworks for clinically relevant 3D reconstructions.

Paper Structure

This paper contains 27 sections, 2 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: X2B network training pipeline. The figure illustrates the training process of the X2B network, which takes a DRR as input and uses a ConvNeXt backbone for feature extraction. The extracted features are passed through dense blocks with Conditional Batch Normalization (CBN) layers, parameterized by $\beta_i$ and $\gamma_i$, to refine the latent representations.
  • Figure 2: X2BR model architecture. Anterior-posterior DRR is used as input for both YOLOv9 and X2B for inference. The regions are detected via YOLOv9 network and DRR specific template mesh is extracted from the template mesh model using the detected regions. DRR specific template model is registered to the course shape, which is the output of the X2BR model.
  • Figure 3: For 3D mesh inference with X2B model, a modified Multiresolution IsoSurface Extraction (MISE) algorithm for high-resolution mesh extraction is integrated, starting with a base resolution and evaluating against the occupancy network. The occupancy threshold is set at $\tau = 0.2$ for balance in accuracy and completeness. The process involves subdividing voxels until the desired resolution is reached, using Marching Cubes for mesh generation, and refining the mesh with Fast-Quadric-Mesh-Simplification and gradient optimization. Our method achieves efficient and accurate mesh inference, optimized for an initial resolution of $32^3$, and is capable of extracting mesh normals effectively.
  • Figure 4: Comparison of X2B and GT. The first column displays the DRR inputs, while the second column presents the GT meshes. The subsequent columns show the X2B reconstructions (X2B-f), X2B heatmaps (X2B-h-f), GT-b, X2B-b, and X2B heatmaps in the back view (X2B-h-b).
  • Figure 5: Comparison of reconstruction results across different methods. The figure presents DRR images (leftmost column) and their corresponding GT 3D meshes alongside reconstructed outputs from various methods: D2IM, ED2IF2, X2V, X2B, and X2BR. Each row corresponds to a different DRR input, while the columns illustrate the progression of reconstruction quality across the methods.
  • ...and 2 more figures