Memory-Efficient 3D High-Resolution Medical Image Synthesis Using CRF-Guided GANs
Mahshid Shiri, Alessandro Bruno, Daniele Loiacono
TL;DR
This work tackles the memory bottleneck of 3D high-resolution medical image synthesis by introducing CRF-GAN, a memory-efficient architecture that enforces cross-patch consistency through a Conditional Random Field (CRF) applied to a mid-level image embedding. The generator is split into a two-step process (embedding generation and patch rendering from a subset), and a half-encoder bridges real and synthetic embeddings to train the CRF without extra networks. The CRF-GAN achieves better fidelity (lower FID and MMD) than the state-of-the-art HA-GAN at $256^3$, while also reducing model size, memory usage, and training time, and it benefits downstream augmentation tasks by improving recall and F1 in a nodule-detection setting. This approach offers a scalable path to high-quality 3D medical image synthesis with practical implications for data augmentation and clinical realism across CT and MRI domains.
Abstract
Generative Adversarial Networks (GANs) have many potential medical imaging applications. Due to the limited memory of Graphical Processing Units (GPUs), most current 3D GAN models are trained on low-resolution medical images, these models cannot scale to high-resolution or are susceptible to patchy artifacts. In this work, we propose an end-to-end novel GAN architecture that uses Conditional Random field (CRF) to model dependencies so that it can generate consistent 3D medical Images without exploiting memory. To achieve this purpose, the generator is divided into two parts during training, the first part produces an intermediate representation and CRF is applied to this intermediate representation to capture correlations. The second part of the generator produces a random sub-volume of image using a subset of the intermediate representation. This structure has two advantages: first, the correlations are modeled by using the features that the generator is trying to optimize. Second, the generator can generate full high-resolution images during inference. Experiments on Lung CTs and Brain MRIs show that our architecture outperforms state-of-the-art while it has lower memory usage and less complexity.
