Table of Contents
Fetching ...

HC$^3$L-Diff: Hybrid conditional latent diffusion with high frequency enhancement for CBCT-to-CT synthesis

Shi Yin, Hongqi Tan, Li Ming Chong, Haofeng Liu, Hui Liu, Kang Hao Lee, Jeffrey Kit Loong Tuan, Dean Ho, Yueming Jin

TL;DR

A novel hybrid conditional latent diffusion model for efficient and accurate CBCT-to-CT synthesis that outperforms state-of-the-art methods in terms of sCT quality and generation efficiency and shows great potential for enhancing real-world adaptive radiotherapy.

Abstract

Background: Cone-beam computed tomography (CBCT) plays a crucial role in image-guided radiotherapy, but artifacts and noise make them unsuitable for accurate dose calculation. Artificial intelligence methods have shown promise in enhancing CBCT quality to produce synthetic CT (sCT) images. However, existing methods either produce images of suboptimal quality or incur excessive time costs, failing to satisfy clinical practice standards. Methods and materials: We propose a novel hybrid conditional latent diffusion model for efficient and accurate CBCT-to-CT synthesis, named HC$^3$L-Diff. We employ the Unified Feature Encoder (UFE) to compress images into a low-dimensional latent space, thereby optimizing computational efficiency. Beyond the use of CBCT images, we propose integrating its high-frequency knowledge as a hybrid condition to guide the diffusion model in generating sCT images with preserved structural details. This high-frequency information is captured using our designed High-Frequency Extractor (HFE). During inference, we utilize denoising diffusion implicit model to facilitate rapid sampling. We construct a new in-house prostate dataset with paired CBCT and CT to validate the effectiveness of our method. Result: Extensive experimental results demonstrate that our approach outperforms state-of-the-art methods in terms of sCT quality and generation efficiency. Moreover, our medical physicist conducts the dosimetric evaluations to validate the benefit of our method in practical dose calculation, achieving a remarkable 93.8% gamma passing rate with a 2%/2mm criterion, superior to other methods. Conclusion: The proposed HC$^3$L-Diff can efficiently achieve high-quality CBCT-to-CT synthesis in only over 2 mins per patient. Its promising performance in dose calculation shows great potential for enhancing real-world adaptive radiotherapy.

HC$^3$L-Diff: Hybrid conditional latent diffusion with high frequency enhancement for CBCT-to-CT synthesis

TL;DR

A novel hybrid conditional latent diffusion model for efficient and accurate CBCT-to-CT synthesis that outperforms state-of-the-art methods in terms of sCT quality and generation efficiency and shows great potential for enhancing real-world adaptive radiotherapy.

Abstract

Background: Cone-beam computed tomography (CBCT) plays a crucial role in image-guided radiotherapy, but artifacts and noise make them unsuitable for accurate dose calculation. Artificial intelligence methods have shown promise in enhancing CBCT quality to produce synthetic CT (sCT) images. However, existing methods either produce images of suboptimal quality or incur excessive time costs, failing to satisfy clinical practice standards. Methods and materials: We propose a novel hybrid conditional latent diffusion model for efficient and accurate CBCT-to-CT synthesis, named HCL-Diff. We employ the Unified Feature Encoder (UFE) to compress images into a low-dimensional latent space, thereby optimizing computational efficiency. Beyond the use of CBCT images, we propose integrating its high-frequency knowledge as a hybrid condition to guide the diffusion model in generating sCT images with preserved structural details. This high-frequency information is captured using our designed High-Frequency Extractor (HFE). During inference, we utilize denoising diffusion implicit model to facilitate rapid sampling. We construct a new in-house prostate dataset with paired CBCT and CT to validate the effectiveness of our method. Result: Extensive experimental results demonstrate that our approach outperforms state-of-the-art methods in terms of sCT quality and generation efficiency. Moreover, our medical physicist conducts the dosimetric evaluations to validate the benefit of our method in practical dose calculation, achieving a remarkable 93.8% gamma passing rate with a 2%/2mm criterion, superior to other methods. Conclusion: The proposed HCL-Diff can efficiently achieve high-quality CBCT-to-CT synthesis in only over 2 mins per patient. Its promising performance in dose calculation shows great potential for enhancing real-world adaptive radiotherapy.

Paper Structure

This paper contains 16 sections, 11 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Overview of our proposed HC$^3$L-Diff: (a) In the forward diffusion process, the UFE $\mathcal{E}$ encodes the CT image $y$ into latent space representation $z_{0}$ with 4 channels to realize image compression, and the noise is gradually added to get the standard Gaussian noise $z_{T}$. (b) During the conditional reverse denoising process, the CBCT image $x$ and the obtained high-frequency image $x_{h}$, derived from the HFE, are encoded with $\mathcal{E}$ respectively to acquire latent features $z_{x}$ and $z_{xh}$. These features are integrated in the latent space and serve as the hybrid condition for U-Net, which predicts noise at each time step to facilitate step-by-step denoising until generating $\hat{z}_{0}$. Finally, the decoder $\mathcal{D}$ is used to convert it from latent space back into pixel space and get the sCT $\hat{y}$. (c) The HFE processes CBCT images by applying FFT and FFT shift to preserve high-frequency information, and then utilizes IFFT to get the high-frequency images of CBCT.
  • Figure 2: Qualitative comparison of different generative methods, and the four rows represent prostate images from four different patients.
  • Figure 3: HU value difference maps of CT images and sCT images generated by different methods. Each row corresponds to the results from one patient.
  • Figure 4: Figures (a) and (b) show the boxplots of the GPRs between the sCT and the ground truth CT for three different dose thresholds using the 3%/3mm and 2%/2mm gamma criteria respectively. Figures (c) and (d) show the percentage difference in the DVH parameters (D95, D98, and Dmax) between the sCT and the ground truth CT for prostate and PTV respectively.
  • Figure 5: Figure of the dose distribution in the ground truth CT and the three different sCT algorithms from a VMAT delivery. The yellow and red color wash shows the 95% and 100% isodose regions. The prostate and the PTV are shown by the blue and red curves respectively.