Table of Contents
Fetching ...

X-GRM: Large Gaussian Reconstruction Model for Sparse-view X-rays to Computed Tomography

Yifan Liu, Wuyang Li, Weihao Yu, Chenxin Li, Alexandre Alahi, Max Meng, Yixuan Yuan

TL;DR

X-GRM addresses the challenge of reconstructing 3D CT volumes from sparse-view X-ray projections by leveraging a large-scale, feed-forward transformer architecture and a flexible volume representation called VoxGS. The method decouples projection encoding (via an X-ray Reconstruction Transformer with an Encoder ViT and a Fusion ViT) from volume decoding using voxel-centered Gaussian primitives, enabling efficient, differentiable X-ray rendering and direct volume extraction. Trained on a large, diverse CT dataset, X-GRM achieves state-of-the-art reconstruction quality and fast inference across multiple sparse-view settings, with strong cross-dataset generalization and capable novel-view synthesis. This combination promises practical impact for low-dose, time-sensitive clinical workflows and opens avenues for integrated CT/X-ray applications and downstream rendering tasks.

Abstract

Computed Tomography serves as an indispensable tool in clinical workflows, providing non-invasive visualization of internal anatomical structures. Existing CT reconstruction works are limited to small-capacity model architecture and inflexible volume representation. In this work, we present X-GRM (X-ray Gaussian Reconstruction Model), a large feedforward model for reconstructing 3D CT volumes from sparse-view 2D X-ray projections. X-GRM employs a scalable transformer-based architecture to encode sparse-view X-ray inputs, where tokens from different views are integrated efficiently. Then, these tokens are decoded into a novel volume representation, named Voxel-based Gaussian Splatting (VoxGS), which enables efficient CT volume extraction and differentiable X-ray rendering. This combination of a high-capacity model and flexible volume representation, empowers our model to produce high-quality reconstructions from various testing inputs, including in-domain and out-domain X-ray projections. Our codes are available at: https://github.com/CUHK-AIM-Group/X-GRM.

X-GRM: Large Gaussian Reconstruction Model for Sparse-view X-rays to Computed Tomography

TL;DR

X-GRM addresses the challenge of reconstructing 3D CT volumes from sparse-view X-ray projections by leveraging a large-scale, feed-forward transformer architecture and a flexible volume representation called VoxGS. The method decouples projection encoding (via an X-ray Reconstruction Transformer with an Encoder ViT and a Fusion ViT) from volume decoding using voxel-centered Gaussian primitives, enabling efficient, differentiable X-ray rendering and direct volume extraction. Trained on a large, diverse CT dataset, X-GRM achieves state-of-the-art reconstruction quality and fast inference across multiple sparse-view settings, with strong cross-dataset generalization and capable novel-view synthesis. This combination promises practical impact for low-dose, time-sensitive clinical workflows and opens avenues for integrated CT/X-ray applications and downstream rendering tasks.

Abstract

Computed Tomography serves as an indispensable tool in clinical workflows, providing non-invasive visualization of internal anatomical structures. Existing CT reconstruction works are limited to small-capacity model architecture and inflexible volume representation. In this work, we present X-GRM (X-ray Gaussian Reconstruction Model), a large feedforward model for reconstructing 3D CT volumes from sparse-view 2D X-ray projections. X-GRM employs a scalable transformer-based architecture to encode sparse-view X-ray inputs, where tokens from different views are integrated efficiently. Then, these tokens are decoded into a novel volume representation, named Voxel-based Gaussian Splatting (VoxGS), which enables efficient CT volume extraction and differentiable X-ray rendering. This combination of a high-capacity model and flexible volume representation, empowers our model to produce high-quality reconstructions from various testing inputs, including in-domain and out-domain X-ray projections. Our codes are available at: https://github.com/CUHK-AIM-Group/X-GRM.

Paper Structure

This paper contains 33 sections, 11 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Our method achieves state-of-the-art reconstruction quality while maintaining the fastest runtime. (a) Qualitative results: DIF-Gaussian dif_gs exhibits issues with over-smooth results (red boxes) and R$^2$-Gaussian r2_gaussian has noise artifacts (orange boxes) and is time-consuming. In contrast, our method achieves better fidelity in a much shorter time. (b) Performance and runtime comparison: metrics are evaluated on the test set of collected large-scale dataset.
  • Figure 1: The statistics of collected datasets.
  • Figure 2: X-GRM is a large feed-forward transformer trained on a curated large CT reconstruction dataset. (a) X-ray Reconstruction Transformer efficiently encodes and fuses tokens from multiple X-ray projections, and (b) Voxel-based Gaussian Splatting enables both the efficient CT volume extraction and differentiable X-ray rendering.
  • Figure 3: Qualitative comparison with traditional and feedforward methods. Results shown are from the test set reconstructions with 10-view inputs.
  • Figure 4: Qualitative comparison with self-supervised models. Results shown are from the test set reconstructions with 10-view inputs.
  • ...and 5 more figures