From Healthy Scans to Annotated Tumors: A Tumor Fabrication Framework for 3D Brain MRI Synthesis
Nayu Dong, Townim Chowdhury, Hieu Phan, Mark Jenkinson, Johan Verjans, Zhibin Liao
TL;DR
The paper tackles the scarcity of annotated MRI tumor data by introducing Tumor Fabrication (TF), a two-stage framework that turns healthy 3D brain scans into realistic tumor-bearing images with paired labels using only a small set of real tumor data. TF-Aug generates coarse image–label pairs through ROI-based augmentation on healthy scans, and TF-GAN refines these into high-fidelity tumor-bearing volumes with a dual-head discriminator and class-wise perceptual guidance. Across BraTS 2023, TF data enrichment yields statistically significant segmentation gains in low-data regimes, outperforming CarveMix and Pix2Pix baselines and demonstrating robustness to varying synthetic data volumes. The results highlight the practical potential of leveraging abundant healthy data to enrich supervised learning in clinical AI, while acknowledging limitations in edema realism and mass-effect modeling and outlining directions for more realistic, customizable synthesis.
Abstract
The scarcity of annotated Magnetic Resonance Imaging (MRI) tumor data presents a major obstacle to accurate and automated tumor segmentation. While existing data synthesis methods offer promising solutions, they often suffer from key limitations: manual modeling is labor intensive and requires expert knowledge. Deep generative models may be used to augment data and annotation, but they typically demand large amounts of training pairs in the first place, which is impractical in data limited clinical settings. In this work, we propose Tumor Fabrication (TF), a novel two-stage framework for unpaired 3D brain tumor synthesis. The framework comprises a coarse tumor synthesis process followed by a refinement process powered by a generative model. TF is fully automated and leverages only healthy image scans along with a limited amount of real annotated data to synthesize large volumes of paired synthetic data for enriching downstream supervised segmentation training. We demonstrate that our synthetic image-label pairs used as data enrichment can significantly improve performance on downstream tumor segmentation tasks in low-data regimes, offering a scalable and reliable solution for medical image enrichment and addressing critical challenges in data scarcity for clinical AI applications.
