Table of Contents
Fetching ...

Prompting Lipschitz-constrained network for multiple-in-one sparse-view CT reconstruction

Baoshun Shi, Ke Jiang, Qiusheng Lian, Xinran Yu, Huazhu Fu

TL;DR

The paper tackles sparse-view CT reconstruction by introducing LipNet, a Lipschitz-constrained prior network, and PromptCT, a storage-efficient deep unfolding framework that handles multiple sparse-view configurations with a single model. LipNet provides a provable boundary property and Lipschitz continuity, while PromptCT unrolls a proximal-gradient-like solver with LipNet as the proximal operator, using explicit prompts to encode view information. The approach achieves superior reconstruction quality across single- and multi-view settings, with significantly reduced storage costs, and demonstrates convergence and stability through theoretical analysis and extensive experiments on synthetic and real data. The methods offer practical benefits for clinical deployment by enabling a universal model for diverse sampling scenarios and providing theoretical guarantees for convergence and robustness.

Abstract

Despite significant advancements in deep learning-based sparse-view computed tomography (SVCT) reconstruction algorithms, these methods still encounter two primary limitations: (i) It is challenging to explicitly prove that the prior networks of deep unfolding algorithms satisfy Lipschitz constraints due to their empirically designed nature. (ii) The substantial storage costs of training a separate model for each setting in the case of multiple views hinder practical clinical applications. To address these issues, we elaborate an explicitly provable Lipschitz-constrained network, dubbed LipNet, and integrate an explicit prompt module to provide discriminative knowledge of different sparse sampling settings, enabling the treatment of multiple sparse view configurations within a single model. Furthermore, we develop a storage-saving deep unfolding framework for multiple-in-one SVCT reconstruction, termed PromptCT, which embeds LipNet as its prior network to ensure the convergence of its corresponding iterative algorithm. In simulated and real data experiments, PromptCT outperforms benchmark reconstruction algorithms in multiple-in-one SVCT reconstruction, achieving higher-quality reconstructions with lower storage costs. On the theoretical side, we explicitly demonstrate that LipNet satisfies boundary property, further proving its Lipschitz continuity and subsequently analyzing the convergence of the proposed iterative algorithms. The data and code are publicly available at https://github.com/shibaoshun/PromptCT.

Prompting Lipschitz-constrained network for multiple-in-one sparse-view CT reconstruction

TL;DR

The paper tackles sparse-view CT reconstruction by introducing LipNet, a Lipschitz-constrained prior network, and PromptCT, a storage-efficient deep unfolding framework that handles multiple sparse-view configurations with a single model. LipNet provides a provable boundary property and Lipschitz continuity, while PromptCT unrolls a proximal-gradient-like solver with LipNet as the proximal operator, using explicit prompts to encode view information. The approach achieves superior reconstruction quality across single- and multi-view settings, with significantly reduced storage costs, and demonstrates convergence and stability through theoretical analysis and extensive experiments on synthetic and real data. The methods offer practical benefits for clinical deployment by enabling a universal model for diverse sampling scenarios and providing theoretical guarantees for convergence and robustness.

Abstract

Despite significant advancements in deep learning-based sparse-view computed tomography (SVCT) reconstruction algorithms, these methods still encounter two primary limitations: (i) It is challenging to explicitly prove that the prior networks of deep unfolding algorithms satisfy Lipschitz constraints due to their empirically designed nature. (ii) The substantial storage costs of training a separate model for each setting in the case of multiple views hinder practical clinical applications. To address these issues, we elaborate an explicitly provable Lipschitz-constrained network, dubbed LipNet, and integrate an explicit prompt module to provide discriminative knowledge of different sparse sampling settings, enabling the treatment of multiple sparse view configurations within a single model. Furthermore, we develop a storage-saving deep unfolding framework for multiple-in-one SVCT reconstruction, termed PromptCT, which embeds LipNet as its prior network to ensure the convergence of its corresponding iterative algorithm. In simulated and real data experiments, PromptCT outperforms benchmark reconstruction algorithms in multiple-in-one SVCT reconstruction, achieving higher-quality reconstructions with lower storage costs. On the theoretical side, we explicitly demonstrate that LipNet satisfies boundary property, further proving its Lipschitz continuity and subsequently analyzing the convergence of the proposed iterative algorithms. The data and code are publicly available at https://github.com/shibaoshun/PromptCT.

Paper Structure

This paper contains 29 sections, 23 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: (a) The proposed deep unfolding PromptCT enables SVCT reconstruction across varying numbers of projection data, e.g., 60, 90, 120, and 180, using a single model to aid in medical diagnosis and treatment, where each stage incorporates the image update module (IUM) and the prompt-based LipNet that satisfies the Lipschitz constraint. (b) We compare the average PSNR (dB) and storage cost (MB) of the proposed SVCT method and the existing SVCT methods under four sampling views. It reveals that the proposed strategy significantly saves storage costs while improving reconstruction performance. Notably, MLipCT is the multi-view model of the backbone architecture, i.e., LipCT, without explicit prompt module, and PromptCT is the multi-view model with explicit prompt module.
  • Figure 2: The network architecture of LipNet. Each stage of this network consists of an analysis frame $\bm{W}$, a soft shrinkage operation, a synthesis frame $\bm{W}^{\rm{T}}$, and a constant-generating sub-network (CGNet), forming a single-layer sparse representation model-driven architecture. The element-wise products of the generated proportional constants $\bm{c}$ and the elements in the input noisy map $\bm{s}$ are used as the thresholds $\bm{e}$ for shrinking frame coefficients. The network architecture of CGNet comprises the shallow feature extraction module, the deep feature extraction module, and the explicit prompt module, utilized for generating spatial-variant proportional constants. Among them, the shallow feature extraction contains two $3\times3$ convolutional layers, and the deep feature extraction consists of STB and SFB for extracting local, regional, and global feature information, respectively. In addition, we embed view prompts between these two feature extraction modules to guide the differentiation of multiple sparse sampling views.
  • Figure 3: The proposed deep unfolding proximal gradient descent network architecture (i.e., PromptCT) consists of the IUM and the prompting Lipschitz-constrained network $\mathcal{D}_{\theta}(\cdot;\sigma)$ (i.e., LipNet).
  • Figure 4: Visual comparison of SVCT reconstruction methods on AAPM dataset under different sparse-view settings. The display window is [-175, 500] HU. Regions of interest are zoomed in for better viewing. Blue arrows indicate key areas, such as the sternum, intestinal tissues, and tissues near the femoral artery, which highlight differences in reconstruction quality among the methods. For single-view models, our LipCT outperforms other benchmark methods. For multi-view models, our PromptCT method, which incorporates explicit prompts, achieves better performance than that of MLipCT.
  • Figure 5: Visual comparison of different SVCT methods on the DeepLesion dataset at the sparse view number of 60. Regions of interest are zoomed in for better viewing. Blue arrows indicate the bone structure and highlight the differences in reconstruction quality among the methods. For single-view models, our LipCT outperforms other benchmark methods. For multi-view models, our PromptCT method, which incorporates explicit prompts, achieves better performance than that of MLipCT.
  • ...and 8 more figures