Table of Contents
Fetching ...

Multi-view X-ray Image Synthesis with Multiple Domain Disentanglement from CT Scans

Lixing Tan, Shuang Song, Kangneng Zhou, Chengbo Duan, Lanying Wang, Huayang Ren, Linlin Liu, Wei Zhang, Ruoxiu Xiao

TL;DR

This work tackles end-to-end multi-view X-ray image synthesis from CT scans in the presence of unpaired data. It introduces CT2X-GAN, a disentanglement-based framework that separates anatomical content from image style using a style decoupling encoder across CT, X-ray, and DRR domains, complemented by consistency regularization and a pose attention module to preserve geometry across views. The model leverages DRR-based supervision and a DRR DRR channel to constrain style learning, achieving high-quality, multi-view X-ray synthesis demonstrated on the CTSpine1K-derived dataset with strong metrics (FID, KID, LPIPS) and favorable qualitative results compared to 3D-aware baselines π-GAN and EG3D. While effective, the approach acknowledges limitations in fully preserving all anatomical details and distance perception, suggesting future work on stronger structural constraints and distance-aware modeling to broaden applicability in clinical workflows.

Abstract

X-ray images play a vital role in the intraoperative processes due to their high resolution and fast imaging speed and greatly promote the subsequent segmentation, registration and reconstruction. However, over-dosed X-rays superimpose potential risks to human health to some extent. Data-driven algorithms from volume scans to X-ray images are restricted by the scarcity of paired X-ray and volume data. Existing methods are mainly realized by modelling the whole X-ray imaging procedure. In this study, we propose a learning-based approach termed CT2X-GAN to synthesize the X-ray images in an end-to-end manner using the content and style disentanglement from three different image domains. Our method decouples the anatomical structure information from CT scans and style information from unpaired real X-ray images/ digital reconstructed radiography (DRR) images via a series of decoupling encoders. Additionally, we introduce a novel consistency regularization term to improve the stylistic resemblance between synthesized X-ray images and real X-ray images. Meanwhile, we also impose a supervised process by computing the similarity of computed real DRR and synthesized DRR images. We further develop a pose attention module to fully strengthen the comprehensive information in the decoupled content code from CT scans, facilitating high-quality multi-view image synthesis in the lower 2D space. Extensive experiments were conducted on the publicly available CTSpine1K dataset and achieved 97.8350, 0.0842 and 3.0938 in terms of FID, KID and defined user-scored X-ray similarity, respectively. In comparison with 3D-aware methods ($π$-GAN, EG3D), CT2X-GAN is superior in improving the synthesis quality and realistic to the real X-ray images.

Multi-view X-ray Image Synthesis with Multiple Domain Disentanglement from CT Scans

TL;DR

This work tackles end-to-end multi-view X-ray image synthesis from CT scans in the presence of unpaired data. It introduces CT2X-GAN, a disentanglement-based framework that separates anatomical content from image style using a style decoupling encoder across CT, X-ray, and DRR domains, complemented by consistency regularization and a pose attention module to preserve geometry across views. The model leverages DRR-based supervision and a DRR DRR channel to constrain style learning, achieving high-quality, multi-view X-ray synthesis demonstrated on the CTSpine1K-derived dataset with strong metrics (FID, KID, LPIPS) and favorable qualitative results compared to 3D-aware baselines π-GAN and EG3D. While effective, the approach acknowledges limitations in fully preserving all anatomical details and distance perception, suggesting future work on stronger structural constraints and distance-aware modeling to broaden applicability in clinical workflows.

Abstract

X-ray images play a vital role in the intraoperative processes due to their high resolution and fast imaging speed and greatly promote the subsequent segmentation, registration and reconstruction. However, over-dosed X-rays superimpose potential risks to human health to some extent. Data-driven algorithms from volume scans to X-ray images are restricted by the scarcity of paired X-ray and volume data. Existing methods are mainly realized by modelling the whole X-ray imaging procedure. In this study, we propose a learning-based approach termed CT2X-GAN to synthesize the X-ray images in an end-to-end manner using the content and style disentanglement from three different image domains. Our method decouples the anatomical structure information from CT scans and style information from unpaired real X-ray images/ digital reconstructed radiography (DRR) images via a series of decoupling encoders. Additionally, we introduce a novel consistency regularization term to improve the stylistic resemblance between synthesized X-ray images and real X-ray images. Meanwhile, we also impose a supervised process by computing the similarity of computed real DRR and synthesized DRR images. We further develop a pose attention module to fully strengthen the comprehensive information in the decoupled content code from CT scans, facilitating high-quality multi-view image synthesis in the lower 2D space. Extensive experiments were conducted on the publicly available CTSpine1K dataset and achieved 97.8350, 0.0842 and 3.0938 in terms of FID, KID and defined user-scored X-ray similarity, respectively. In comparison with 3D-aware methods (-GAN, EG3D), CT2X-GAN is superior in improving the synthesis quality and realistic to the real X-ray images.
Paper Structure (39 sections, 11 equations, 10 figures, 4 tables)

This paper contains 39 sections, 11 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Illustration of the difference between the traditional and proposed pipeline of X-ray image synthesis from 3D volume scans.
  • Figure 2: An illustration of the proposed CT2X-GAN. A CT encoder $E_{CT}$ extracts the content code from the input CT scan $V$. The style decoupling encoder $E_{sty}$ extracts a style code $f_{sty}^X$ from the input X-ray image $I_X$. The generator $G$ incorporates both the style code and content code to generate the X-ray synthesis result $\hat{I}_X$. We also employ the style decoupling encoder to extract a DRR style code $f_{sty}^{DRR}$ from the DRR and use it to synthesize a stylized reconstructed DRR $\hat{I}_{DRR}$, providing auxiliary constraint to the training. A consistency regularization term is calculated to improve domain-specific style extraction. Additionally, we employ a pose attention module to accentuate features with the target view angle based on the corresponding projection.
  • Figure 3: An illustration of consistency regularization. $f_{sty}^{X}$ and $f_{sty}^{DRR}$ represent the style code extracted from X-ray and DRR images. $f_{sty}^{\hat{X}}$ and $f_{sty}^{\hat{DRR}}$ are the style features of synthesised results using X-ray and DRR, respectively. $f_{c}^{\hat{X}}$ and $f_{c}^{\hat{DRR}}$ share the same contents.
  • Figure 4: Qualitative comparison between $\pi$-GAN, EG3D and ours at a resolution of $256*256$.
  • Figure 5: Qualitative ablation study for proposed modules.
  • ...and 5 more figures