C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction

Yiqun Lin; Jiewen Yang; Hualiang Wang; Xinpeng Ding; Wei Zhao; Xiaomeng Li

C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction

Yiqun Lin, Jiewen Yang, Hualiang Wang, Xinpeng Ding, Wei Zhao, Xiaomeng Li

TL;DR

This work tackles sparse-view cone-beam CT reconstruction by framing it as a 3D representation problem and introducing cross-regional and cross-view learning. It proposes C^2RV, which combines multi-scale 3D volumetric representations (MS-3DV) with scale-view cross-attention (SVC-Att) to fuse voxel-aligned and view-aligned features for accurate attenuation estimation. Across chest and knee datasets, C^2RV achieves consistent, significant improvements over state-of-the-art methods in PSNR and SSIM, while also delivering better segmentation alignment in downstream tasks. The approach reduces reliance on dense projections, enabling high-quality reconstructions with fewer views and showing robustness to mild variations in scanning parameters.

Abstract

Cone beam computed tomography (CBCT) is an important imaging technology widely used in medical scenarios, such as diagnosis and preoperative planning. Using fewer projection views to reconstruct CT, also known as sparse-view reconstruction, can reduce ionizing radiation and further benefit interventional radiology. Compared with sparse-view reconstruction for traditional parallel/fan-beam CT, CBCT reconstruction is more challenging due to the increased dimensionality caused by the measurement process based on cone-shaped X-ray beams. As a 2D-to-3D reconstruction problem, although implicit neural representations have been introduced to enable efficient training, only local features are considered and different views are processed equally in previous works, resulting in spatial inconsistency and poor performance on complicated anatomies. To this end, we propose C^2RV by leveraging explicit multi-scale volumetric representations to enable cross-regional learning in the 3D space. Additionally, the scale-view cross-attention module is introduced to adaptively aggregate multi-scale and multi-view features. Extensive experiments demonstrate that our C^2RV achieves consistent and significant improvement over previous state-of-the-art methods on datasets with diverse anatomy.

C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction

TL;DR

Abstract

Paper Structure (16 sections, 12 equations, 7 figures, 5 tables)

This paper contains 16 sections, 12 equations, 7 figures, 5 tables.

Introduction
Related Work
Sparse-View CT Reconstruction
Sparse-View CBCT Reconstruction
Sparse-View 3D Reconstruction
Methodology
Revisit DIF-Net lin2023learning
C$^\text{2}$RV Framework
Network Training
Experiments
Experimental Setting
Results
Ablation Study
Proposed MS-3DV and SVC-Att
Robustness Analysis
...and 1 more sections

Figures (7)

Figure 1: (a) Cone-shaped X-ray beams are emitted from the scanning source and a 2D array of detectors measures the transmitted radiation. (b) Cross-regional (red) and cross-view (green) feature learning to enhance point-wise representation.
Figure 2: Right-left (RL) and anterior-posterior (AP) views of the knee. Green: femur. Red: tibia. Yellow: patella. Blue: fibula. The patella and femur overlap in the AP view but not in the RL view.
Figure 3: The overview of the proposed sparse-view reconstruction framework C$^\text{2}$RV. Given multi-view projections, a 2D encoder-decoder is applied to extract view-wise feature map $\mathcal{F}_i$ for querying the pixel-aligned feature $\mathcal{F}_i(p)$. Additionally, the output feature map $F^1$ of the encoder is downsampled to obtain multi-scale feature maps. At each scale $s$, multi-view features are back-projected to the 3D space and gathered to form the 3D volumetric representation $\hat{\mathcal{F}}^s$ for querying the voxel-aligned feature $\hat{\mathcal{F}}^s(p)$. Finally, multi-scale voxel-aligned features and multi-view pixel-aligned features are aggregated via scale-view cross-attention modules to estimate the attenuation coefficient.
Figure 4: The overview of scale-view cross attention (SVC-Att) module. In each SVC-Att module, a self-attention is first applied to multi-view features, and then a cross-attention is followed to conduct attention between multi-scale features and multi-view features. $M$ SVC-Att modules are stacked and finally followed by a linear layer to estimate the attenuation coefficient.
Figure 5: Visualization of 6-view reconstructed chest CT. From top to bottom: axial, coronal, and sagittal slices.
...and 2 more figures

C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction

TL;DR

Abstract

C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction

Authors

TL;DR

Abstract

Table of Contents

Figures (7)