Table of Contents
Fetching ...

Cross-Modal Guidance for Fast Diffusion-Based Computed Tomography

Timofey Efimov, Singanallur Venkatakrishnan, Maliha Hossain, Haley Duba-Sullivan, Amirkoushyar Ziabari

TL;DR

This work proposes incorporating an additional modality without retraining the diffusion prior, enabling accelerated imaging of costly modalities and examines the impact of imperfect side modalities on cross-modal guidance.

Abstract

Diffusion models have emerged as powerful priors for solving inverse problems in computed tomography (CT). In certain applications, such as neutron CT, it can be expensive to collect large amounts of measurements even for a single scan, leading to sparse data sets from which it is challenging to obtain high quality reconstructions even with diffusion models. One strategy to mitigate this challenge is to leverage a complementary, easily available imaging modality; however, such approaches typically require retraining the diffusion model with large datasets. In this work, we propose incorporating an additional modality without retraining the diffusion prior, enabling accelerated imaging of costly modalities. We further examine the impact of imperfect side modalities on cross-modal guidance. Our method is evaluated on sparse-view neutron computed tomography, where reconstruction quality is substantially improved by incorporating X-ray computed tomography of the same samples.

Cross-Modal Guidance for Fast Diffusion-Based Computed Tomography

TL;DR

This work proposes incorporating an additional modality without retraining the diffusion prior, enabling accelerated imaging of costly modalities and examines the impact of imperfect side modalities on cross-modal guidance.

Abstract

Diffusion models have emerged as powerful priors for solving inverse problems in computed tomography (CT). In certain applications, such as neutron CT, it can be expensive to collect large amounts of measurements even for a single scan, leading to sparse data sets from which it is challenging to obtain high quality reconstructions even with diffusion models. One strategy to mitigate this challenge is to leverage a complementary, easily available imaging modality; however, such approaches typically require retraining the diffusion model with large datasets. In this work, we propose incorporating an additional modality without retraining the diffusion prior, enabling accelerated imaging of costly modalities. We further examine the impact of imperfect side modalities on cross-modal guidance. Our method is evaluated on sparse-view neutron computed tomography, where reconstruction quality is substantially improved by incorporating X-ray computed tomography of the same samples.
Paper Structure (10 sections, 4 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 10 sections, 4 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: The cross-modal approach decouples fitting main modality measurements from enforcing cross-modal consistency. In the first step, we fine-tune the diffusion model weights to better fit the data consistency loss, then obtain the reconstruction estimate via any diffusion-based inverse problem solver. After that stage, cross-modal consistency is enforced via a lightweight image translation model. The cross-modal consistency block can be seamlessly incorporated as a separate module without modifying the prior.
  • Figure 2: Paired degraded NCT/XCT samples (left) and their high-quality reconstructions (right). These examples illustrate the training data used for cross-modal translation.
  • Figure 3: Comparison between XCT images (top row) and corresponding ideal NCT images (bottom row). XCT quality varies widely relative to NCT.
  • Figure 4: Qualitative comparison between D3IP and our cross-modal guided algorithm across different numbers of projection views ($8$, $16$, $32$, and $64$ views, in rows 1–4, respectively). Within each row, the first column shows Unimodal (D3IP) reconstructions, the second column shows our cross-modal guided algorithm reconstructions, and the third column shows the ground truth samples.