Table of Contents
Fetching ...

EAR: Edge-Aware Reconstruction of 3-D vertebrae structures from bi-planar X-ray images

Lixing Tan, Shuang Song, Yaofeng He, Kangneng Zhou, Tong Lu, Ruoxiu Xiao

TL;DR

The paper tackles reconstructing 3-D vertebral volumes from bi-planar X-ray projections, a fundamentally ill-posed task due to information loss. It introduces EAR, an auto-encoder-based framework augmented with a Frequency Enhancement Module (FEM) and a 3-D Edge Attention Module (EAM) to bolster edge and high-frequency details, combined with an edge-displacement loss, a frequency-domain loss, and a projection loss. Evaluations on three public spine datasets with synthetic DRRs show EAR achieves superior metrics across MSE, MAE, Dice, PSNR, SSIM, and FD compared to PSR, SIT, E2E, and BX2S, with ablations confirming each component’s contribution. The work demonstrates end-to-end, edge-focused 3-D spine reconstruction from limited-view X-rays, offering precise vertebral morphology for clinical planning and potential intraoperative guidance, while outlining future directions toward diffusion-based multi-view generation and neural-field representations for better generalization. $\mathcal{L}_{total} = \lambda_1 \mathcal{L}_{recon} + \lambda_2 \mathcal{L}_{edge} + \lambda_3 \mathcal{L}_{freq} + \lambda_4 \mathcal{L}_{proj}$ and edge/frequency terms underpin the learning objective.$

Abstract

X-ray images ease the diagnosis and treatment process due to their rapid imaging speed and high resolution. However, due to the projection process of X-ray imaging, much spatial information has been lost. To accurately provide efficient spinal morphological and structural information, reconstructing the 3-D structures of the spine from the 2-D X-ray images is essential. It is challenging for current reconstruction methods to preserve the edge information and local shapes of the asymmetrical vertebrae structures. In this study, we propose a new Edge-Aware Reconstruction network (EAR) to focus on the performance improvement of the edge information and vertebrae shapes. In our network, by using the auto-encoder architecture as the backbone, the edge attention module and frequency enhancement module are proposed to strengthen the perception of the edge reconstruction. Meanwhile, we also combine four loss terms, including reconstruction loss, edge loss, frequency loss and projection loss. The proposed method is evaluated using three publicly accessible datasets and compared with four state-of-the-art models. The proposed method is superior to other methods and achieves 25.32%, 15.32%, 86.44%, 80.13%, 23.7612 and 0.3014 with regard to MSE, MAE, Dice, SSIM, PSNR and frequency distance. Due to the end-to-end and accurate reconstruction process, EAR can provide sufficient 3-D spatial information and precise preoperative surgical planning guidance.

EAR: Edge-Aware Reconstruction of 3-D vertebrae structures from bi-planar X-ray images

TL;DR

The paper tackles reconstructing 3-D vertebral volumes from bi-planar X-ray projections, a fundamentally ill-posed task due to information loss. It introduces EAR, an auto-encoder-based framework augmented with a Frequency Enhancement Module (FEM) and a 3-D Edge Attention Module (EAM) to bolster edge and high-frequency details, combined with an edge-displacement loss, a frequency-domain loss, and a projection loss. Evaluations on three public spine datasets with synthetic DRRs show EAR achieves superior metrics across MSE, MAE, Dice, PSNR, SSIM, and FD compared to PSR, SIT, E2E, and BX2S, with ablations confirming each component’s contribution. The work demonstrates end-to-end, edge-focused 3-D spine reconstruction from limited-view X-rays, offering precise vertebral morphology for clinical planning and potential intraoperative guidance, while outlining future directions toward diffusion-based multi-view generation and neural-field representations for better generalization. and edge/frequency terms underpin the learning objective.$

Abstract

X-ray images ease the diagnosis and treatment process due to their rapid imaging speed and high resolution. However, due to the projection process of X-ray imaging, much spatial information has been lost. To accurately provide efficient spinal morphological and structural information, reconstructing the 3-D structures of the spine from the 2-D X-ray images is essential. It is challenging for current reconstruction methods to preserve the edge information and local shapes of the asymmetrical vertebrae structures. In this study, we propose a new Edge-Aware Reconstruction network (EAR) to focus on the performance improvement of the edge information and vertebrae shapes. In our network, by using the auto-encoder architecture as the backbone, the edge attention module and frequency enhancement module are proposed to strengthen the perception of the edge reconstruction. Meanwhile, we also combine four loss terms, including reconstruction loss, edge loss, frequency loss and projection loss. The proposed method is evaluated using three publicly accessible datasets and compared with four state-of-the-art models. The proposed method is superior to other methods and achieves 25.32%, 15.32%, 86.44%, 80.13%, 23.7612 and 0.3014 with regard to MSE, MAE, Dice, SSIM, PSNR and frequency distance. Due to the end-to-end and accurate reconstruction process, EAR can provide sufficient 3-D spatial information and precise preoperative surgical planning guidance.
Paper Structure (22 sections, 14 equations, 11 figures, 3 tables)

This paper contains 22 sections, 14 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Overview of the proposed EAR for vertebrae reconstruction: an encoder, a frequency enhancement module, an edge attention module, an attention-based shortcut and a decoder.
  • Figure 2: Bi-planar X-rays for 3-D reconstruction (A). The referential spine is displayed in 3-D view and in both anterior-posterior (AP) and lateral (Lat) projections. Single vertebrae models (B). Digitally reconstructed radiographs generated from 3-D models (C) showing the simulated AP and Lat views of the vertebrae.
  • Figure 3: The diagram of the frequency enhancement module. FFT denotes the fast Fourier transform. Imaginary and real parts after performing FFT are denoted as Imag and Real, respectively. IFFT denotes the inverse fast Fourier transform.
  • Figure 4: The diagram of the edge attention module (EAM). The features from different layers of the encoder, denoted as $e_1$, $e_2$ and $e_3$, respectively are input to EAM to generate an output attention map $A_e$.
  • Figure 5: Example of the reconstructed spine and single vertebrae from different locations of the spine.
  • ...and 6 more figures