Table of Contents
Fetching ...

Structure-Aware Sparse-View X-ray 3D Reconstruction

Yuanhao Cai, Jiahao Wang, Alan Yuille, Zongwei Zhou, Angtian Wang

TL;DR

This work introduces SAX-NeRF, a structure-aware framework for sparse-view X-ray 3D reconstruction. It replaces the conventional MLP-based radiodensity learner with a Line Segment-based Transformer (Lineformer) that captures intra-line dependencies, and couples it with a Masked Local-Global (MLG) ray sampling strategy to extract both local and global information from 2D projections. The approach is evaluated on the newly released X3D dataset, showing substantial gains in novel view synthesis and CT reconstruction compared to state-of-the-art NeRF-based methods, with ablations highlighting the complementary benefits of LS-MSA and MLG. The work also provides a larger-scale X-ray benchmark and demonstrates robustness to reduced projection counts, highlighting practical potential for low-dose X-ray 3D imaging.

Abstract

X-ray, known for its ability to reveal internal structures of objects, is expected to provide richer information for 3D reconstruction than visible light. Yet, existing neural radiance fields (NeRF) algorithms overlook this important nature of X-ray, leading to their limitations in capturing structural contents of imaged objects. In this paper, we propose a framework, Structure-Aware X-ray Neural Radiodensity Fields (SAX-NeRF), for sparse-view X-ray 3D reconstruction. Firstly, we design a Line Segment-based Transformer (Lineformer) as the backbone of SAX-NeRF. Linefomer captures internal structures of objects in 3D space by modeling the dependencies within each line segment of an X-ray. Secondly, we present a Masked Local-Global (MLG) ray sampling strategy to extract contextual and geometric information in 2D projection. Plus, we collect a larger-scale dataset X3D covering wider X-ray applications. Experiments on X3D show that SAX-NeRF surpasses previous NeRF-based methods by 12.56 and 2.49 dB on novel view synthesis and CT reconstruction. Code, models, and data are released at https://github.com/caiyuanhao1998/SAX-NeRF

Structure-Aware Sparse-View X-ray 3D Reconstruction

TL;DR

This work introduces SAX-NeRF, a structure-aware framework for sparse-view X-ray 3D reconstruction. It replaces the conventional MLP-based radiodensity learner with a Line Segment-based Transformer (Lineformer) that captures intra-line dependencies, and couples it with a Masked Local-Global (MLG) ray sampling strategy to extract both local and global information from 2D projections. The approach is evaluated on the newly released X3D dataset, showing substantial gains in novel view synthesis and CT reconstruction compared to state-of-the-art NeRF-based methods, with ablations highlighting the complementary benefits of LS-MSA and MLG. The work also provides a larger-scale X-ray benchmark and demonstrates robustness to reduced projection counts, highlighting practical potential for low-dose X-ray 3D imaging.

Abstract

X-ray, known for its ability to reveal internal structures of objects, is expected to provide richer information for 3D reconstruction than visible light. Yet, existing neural radiance fields (NeRF) algorithms overlook this important nature of X-ray, leading to their limitations in capturing structural contents of imaged objects. In this paper, we propose a framework, Structure-Aware X-ray Neural Radiodensity Fields (SAX-NeRF), for sparse-view X-ray 3D reconstruction. Firstly, we design a Line Segment-based Transformer (Lineformer) as the backbone of SAX-NeRF. Linefomer captures internal structures of objects in 3D space by modeling the dependencies within each line segment of an X-ray. Secondly, we present a Masked Local-Global (MLG) ray sampling strategy to extract contextual and geometric information in 2D projection. Plus, we collect a larger-scale dataset X3D covering wider X-ray applications. Experiments on X3D show that SAX-NeRF surpasses previous NeRF-based methods by 12.56 and 2.49 dB on novel view synthesis and CT reconstruction. Code, models, and data are released at https://github.com/caiyuanhao1998/SAX-NeRF
Paper Structure (14 sections, 18 equations, 8 figures, 3 tables)

This paper contains 14 sections, 18 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Comparisons of X-ray novel view synthesis. On the collected X3D dataset, our method surpasses state-of-the-art algorithms including InTo (InTomo intratomo), NeRF nerf, NeAT neat, NAF naf, and TeRF (TensoRF tensorf) by 10.91, 15.03, 5.13, and 13.76 dB in PSNR on the scenes of medicine, biology, security, and industry. The average gains are over 12 dB. The visual comparisons of our method and the second-best algorithms on four scenes (pelvis, bonsai, box, and engine) show that our method yields more perceptually pleasing results.
  • Figure 2: Visible light vs. X-ray. Visible light imaging relies on reflection. X-ray imaging is based on penetration and attenuation.
  • Figure 3: Overview of our method. (a) SAX-NeRF uses (i) MLG strategy to sample an X-ray batch $\mathcal{R}$. Then $N$ point positions $\mathbf{P}$ on each X-ray $\mathbf{r} \in \mathcal{R}$ are sampled and input into (ii) Lineformer to produce the radiodensity $\mathbf{D}$. (b) Line Segment-based Attention Block (LSAB) is the basic unit of Lineformer. It captures inner structural dependencies by (c) Line Segment-based Multi-head Self-Attention (LS-MSA).
  • Figure 4: Comparison of ray sampling. (a) The naive strategy samples X-rays that land on scattered pixels. (b) Our MLG strategy performs pixel- and patch-level sampling on foreground regions.
  • Figure 5: Qualitative results of novel view synthesis on the scenes of backpack (top) and carp (bottom). Please zoom in for a better view.
  • ...and 3 more figures