ILV: Iterative Latent Volumes for Fast and Accurate Sparse-View CT Reconstruction

Seungryong Lee; Woojeong Baek; Joosang Lee; Eunbyung Park

ILV: Iterative Latent Volumes for Fast and Accurate Sparse-View CT Reconstruction

Seungryong Lee, Woojeong Baek, Joosang Lee, Eunbyung Park

Abstract

A long-term goal in CT imaging is to achieve fast and accurate 3D reconstruction from sparse-view projections, thereby reducing radiation exposure, lowering system cost, and enabling timely imaging in clinical workflows. Recent feed-forward approaches have shown strong potential toward this overarching goal, yet their results still suffer from artifacts and loss of fine details. In this work, we introduce Iterative Latent Volumes (ILV), a feed-forward framework that integrates data-driven priors with classical iterative reconstruction principles to overcome key limitations of prior feed-forward models in sparse-view CBCT reconstruction. At its core, ILV constructs an explicit 3D latent volume that is repeatedly updated by conditioning on multi-view X-ray features and the learned anatomical prior, enabling the recovery of fine structural details beyond the reach of prior feed-forward models. In addition, we develop and incorporate several key architectural components, including an X-ray feature volume, group cross-attention, efficient self-attention, and view-wise feature aggregation, that efficiently realize its core latent volume refinement concept. Extensive experiments on a large-scale dataset of approximately 14,000 CT volumes demonstrate that ILV significantly outperforms existing feed-forward and optimization-based methods in both reconstruction quality and speed. These results show that ILV enables fast and accurate sparse-view CBCT reconstruction suitable for clinical use. The project page is available at: https://sngryonglee.github.io/ILV/.

ILV: Iterative Latent Volumes for Fast and Accurate Sparse-View CT Reconstruction

Abstract

Paper Structure (21 sections, 10 equations, 19 figures, 14 tables)

This paper contains 21 sections, 10 equations, 19 figures, 14 tables.

Related work
Method
Overview
Multi-view X-ray Encoding
Latent Volume Update
Gaussian Volume Decoding
CT Volume Refinement
Loss Function
Experiment
Experimental Settings
Evaluation
Comparison Across Model Sizes
Ablation Studies
Fine-tuning for Improving Visual Quality
Conclusion
...and 6 more sections

Figures (19)

Figure 1: Comparison of the proposed method and state-of-the-art approaches in PSNR (dB) vs. Runtime (sec) plot. Runtime is presented on a logarithmic scale.
Figure 2: Overview of the proposed ILV. Given multi-view X-ray images, ILV reconstructs a 3D CT volume or synthesizes novel-view projections. The overall network consists of four stages: (1) Multi-view X-ray image encoding, (2) Latent volume update, (3) Gaussian volume decoding, and (4) CT volume refinement.
Figure 3: Qualitative comparison with traditional and feed-forward methods on CT reconstruction in the 10-view setting.
Figure 4: Qualitative comparison with optimization-based methods on CT reconstruction in the 24-view setting.
Figure 5: Qualitative comparison on novel view X-ray synthesis.
...and 14 more figures

ILV: Iterative Latent Volumes for Fast and Accurate Sparse-View CT Reconstruction

Abstract

ILV: Iterative Latent Volumes for Fast and Accurate Sparse-View CT Reconstruction

Authors

Abstract

Table of Contents

Figures (19)