Cascaded 3D Diffusion Models for Whole-body 3D 18-F FDG PET/CT synthesis from Demographics
Siyeop Yoon, Sifan Song, Pengfei Jin, Matthew Tivnan, Yujin Oh, Sekeun Kim, Dufan Wu, Xiang Li, Quanzheng Li
TL;DR
This paper tackles the challenge of generating realistic whole-body PET/CT images from demographic information to support digital twins, virtual trials, and data augmentation. It introduces a cascaded 3D diffusion framework that first creates a low-resolution, demography-conditioned global volume via a score-based diffusion, then refines it with a super-resolution residual diffusion trained with patch-wise conditioning. Training on the AutoPET dataset and evaluating organ-wise volumes and SUV distributions, the approach achieves strong agreement with real data, with most uptake deviations within a few percent. The results demonstrate that decoupling global structure from high-frequency details enables scalable, population-informed synthetic imaging and offers a viable alternative to conventional phantoms.
Abstract
We propose a cascaded 3D diffusion model framework to synthesize high-fidelity 3D PET/CT volumes directly from demographic variables, addressing the growing need for realistic digital twins in oncologic imaging, virtual trials, and AI-driven data augmentation. Unlike deterministic phantoms, which rely on predefined anatomical and metabolic templates, our method employs a two-stage generative process. An initial score-based diffusion model synthesizes low-resolution PET/CT volumes from demographic variables alone, providing global anatomical structures and approximate metabolic activity. This is followed by a super-resolution residual diffusion model that refines spatial resolution. Our framework was trained on 18-F FDG PET/CT scans from the AutoPET dataset and evaluated using organ-wise volume and standardized uptake value (SUV) distributions, comparing synthetic and real data between demographic subgroups. The organ-wise comparison demonstrated strong concordance between synthetic and real images. In particular, most deviations in metabolic uptake values remained within 3-5% of the ground truth in subgroup analysis. These findings highlight the potential of cascaded 3D diffusion models to generate anatomically and metabolically accurate PET/CT images, offering a robust alternative to traditional phantoms and enabling scalable, population-informed synthetic imaging for clinical and research applications.
