Table of Contents
Fetching ...

TRACE: Temporally Reliable Anatomically-Conditioned 3D CT Generation with Enhanced Efficiency

Minye Shao, Xingyu Miao, Haoran Duan, Zeyu Wang, Jingkun Chen, Yawen Huang, Xian Wu, Jingjing Deng, Yang Long, Yefeng Zheng

TL;DR

TRACE addresses the need for privacy-preserving, high-fidelity 3D CT generation by modeling volumes as sequences of 2D frames conditioned on segmentation masks and radiology reports. It introduces a multimodal, diffusion-based framework with temporal coherence via optical flow and an overlapping-frame inference strategy to support flexible axial lengths at low compute cost. The approach yields superior anatomical fidelity and temporal consistency, validated through quantitative metrics and expert radiologist evaluation, while substantially reducing training and inference resources. TRACE thus offers a practical solution for data augmentation, privacy preservation, and personalized medical modeling in resource-constrained settings.

Abstract

3D medical image generation is essential for data augmentation and patient privacy, calling for reliable and efficient models suited for clinical practice. However, current methods suffer from limited anatomical fidelity, restricted axial length, and substantial computational cost, placing them beyond reach for regions with limited resources and infrastructure. We introduce TRACE, a framework that generates 3D medical images with spatiotemporal alignment using a 2D multimodal-conditioned diffusion approach. TRACE models sequential 2D slices as video frame pairs, combining segmentation priors and radiology reports for anatomical alignment, incorporating optical flow to sustain temporal coherence. During inference, an overlapping-frame strategy links frame pairs into a flexible length sequence, reconstructed into a spatiotemporally and anatomically aligned 3D volume. Experimental results demonstrate that TRACE effectively balances computational efficiency with preserving anatomical fidelity and spatiotemporal consistency. Code is available at: https://github.com/VinyehShaw/TRACE.

TRACE: Temporally Reliable Anatomically-Conditioned 3D CT Generation with Enhanced Efficiency

TL;DR

TRACE addresses the need for privacy-preserving, high-fidelity 3D CT generation by modeling volumes as sequences of 2D frames conditioned on segmentation masks and radiology reports. It introduces a multimodal, diffusion-based framework with temporal coherence via optical flow and an overlapping-frame inference strategy to support flexible axial lengths at low compute cost. The approach yields superior anatomical fidelity and temporal consistency, validated through quantitative metrics and expert radiologist evaluation, while substantially reducing training and inference resources. TRACE thus offers a practical solution for data augmentation, privacy preservation, and personalized medical modeling in resource-constrained settings.

Abstract

3D medical image generation is essential for data augmentation and patient privacy, calling for reliable and efficient models suited for clinical practice. However, current methods suffer from limited anatomical fidelity, restricted axial length, and substantial computational cost, placing them beyond reach for regions with limited resources and infrastructure. We introduce TRACE, a framework that generates 3D medical images with spatiotemporal alignment using a 2D multimodal-conditioned diffusion approach. TRACE models sequential 2D slices as video frame pairs, combining segmentation priors and radiology reports for anatomical alignment, incorporating optical flow to sustain temporal coherence. During inference, an overlapping-frame strategy links frame pairs into a flexible length sequence, reconstructed into a spatiotemporally and anatomically aligned 3D volume. Experimental results demonstrate that TRACE effectively balances computational efficiency with preserving anatomical fidelity and spatiotemporal consistency. Code is available at: https://github.com/VinyehShaw/TRACE.

Paper Structure

This paper contains 21 sections, 7 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: TRACE delivers Flexible-Length CT volumes, greatly cuts Compute Cost, boosts Anatomical Fidelity.
  • Figure 2: TRACE models 3D CT volumes as sequences of frames, utilizing an efficient 2D diffusion model conditioned on multiple modalities to generate flexible length, coherent CT sequences. During training, it denoises frame pairs with varying skip intervals, guided by four modalities: anatomical masks (VISTA3D), optical flow between frames (RAFT), report embeddings (CLIP), and relative position embeddings. The optical flow and text embeddings pass through trainable adapters before entering the diffusion model. Inference employs an overlapping-frame guidance strategy to synthesize semantically aligned frame pairs, generating anatomically consistent CT sequences, which are then reconstructed back to 3D volumes.
  • Figure 3: Comparison of generated results from multiple perspectives for "52 years old male: Fusiform dilatation in the thoracic aorta. Hepatomegaly, hepatosteatosis. Hiatal hernia. Hypodense nodule in the right thyroid lobe." (a) Comparison of axial slices from GenerateCT, our method, and ground truth (GT), arranged left to right from diaphragm to clavicle, with each method displaying the upper, middle, and lower thorax (frames 9-18, 174-183, and 379-388). (b) 3D rendering comparison highlighting the skeleton, thoracic cavity, and key lung structures. (c) Segmentation results on generated volumes in axial, sagittal, and coronal views, with corresponding 3D renderings.
  • Figure 4: Axial, sagittal, and coronal slices of 3D CT volumes generated by various methods, case: "26-year-old male: Findings compatible with COVID-19 pneumonia".
  • Figure 5: Ablation results for anatomical mask granularity.
  • ...and 1 more figures