Multi-view X-ray Image Synthesis with Multiple Domain Disentanglement from CT Scans
Lixing Tan, Shuang Song, Kangneng Zhou, Chengbo Duan, Lanying Wang, Huayang Ren, Linlin Liu, Wei Zhang, Ruoxiu Xiao
TL;DR
This work tackles end-to-end multi-view X-ray image synthesis from CT scans in the presence of unpaired data. It introduces CT2X-GAN, a disentanglement-based framework that separates anatomical content from image style using a style decoupling encoder across CT, X-ray, and DRR domains, complemented by consistency regularization and a pose attention module to preserve geometry across views. The model leverages DRR-based supervision and a DRR DRR channel to constrain style learning, achieving high-quality, multi-view X-ray synthesis demonstrated on the CTSpine1K-derived dataset with strong metrics (FID, KID, LPIPS) and favorable qualitative results compared to 3D-aware baselines π-GAN and EG3D. While effective, the approach acknowledges limitations in fully preserving all anatomical details and distance perception, suggesting future work on stronger structural constraints and distance-aware modeling to broaden applicability in clinical workflows.
Abstract
X-ray images play a vital role in the intraoperative processes due to their high resolution and fast imaging speed and greatly promote the subsequent segmentation, registration and reconstruction. However, over-dosed X-rays superimpose potential risks to human health to some extent. Data-driven algorithms from volume scans to X-ray images are restricted by the scarcity of paired X-ray and volume data. Existing methods are mainly realized by modelling the whole X-ray imaging procedure. In this study, we propose a learning-based approach termed CT2X-GAN to synthesize the X-ray images in an end-to-end manner using the content and style disentanglement from three different image domains. Our method decouples the anatomical structure information from CT scans and style information from unpaired real X-ray images/ digital reconstructed radiography (DRR) images via a series of decoupling encoders. Additionally, we introduce a novel consistency regularization term to improve the stylistic resemblance between synthesized X-ray images and real X-ray images. Meanwhile, we also impose a supervised process by computing the similarity of computed real DRR and synthesized DRR images. We further develop a pose attention module to fully strengthen the comprehensive information in the decoupled content code from CT scans, facilitating high-quality multi-view image synthesis in the lower 2D space. Extensive experiments were conducted on the publicly available CTSpine1K dataset and achieved 97.8350, 0.0842 and 3.0938 in terms of FID, KID and defined user-scored X-ray similarity, respectively. In comparison with 3D-aware methods ($π$-GAN, EG3D), CT2X-GAN is superior in improving the synthesis quality and realistic to the real X-ray images.
