SV-DRR: High-Fidelity Novel View X-Ray Synthesis Using Diffusion Model
Chun Xie, Yuichi Yoshii, Itaru Kitahara
TL;DR
This work tackles multi-view X-ray synthesis from a single view to reduce radiation and streamline workflows. It introduces SV-DRR, a view-conditioned Diffusion Transformer operating in the VAE latent space, with a weak-to-strong training regime and two conditioning streams to preserve anatomical fidelity across large viewpoint changes. The approach achieves superior quantitative performance over state-of-the-art methods and demonstrates realism levels indistinguishable from diffusion-based simulations in expert assessments. The densely sampled LIDC-IDRI-DRR dataset and the proposed conditioning framework enable robust high-resolution multi-view X-ray generation with practical implications for clinical education, data augmentation, and sparse-view imaging research.
Abstract
X-ray imaging is a rapid and cost-effective tool for visualizing internal human anatomy. While multi-view X-ray imaging provides complementary information that enhances diagnosis, intervention, and education, acquiring images from multiple angles increases radiation exposure and complicates clinical workflows. To address these challenges, we propose a novel view-conditioned diffusion model for synthesizing multi-view X-ray images from a single view. Unlike prior methods, which are limited in angular range, resolution, and image quality, our approach leverages the Diffusion Transformer to preserve fine details and employs a weak-to-strong training strategy for stable high-resolution image generation. Experimental results demonstrate that our method generates higher-resolution outputs with improved control over viewing angles. This capability has significant implications not only for clinical applications but also for medical education and data extension, enabling the creation of diverse, high-quality datasets for training and analysis. Our code is available at https://github.com/xiechun298/SV-DRR.
