Table of Contents
Fetching ...

Deep Generative Model-Based Generation of Synthetic Individual-Specific Brain MRI Segmentations

Ruijie Wang, Luca Rossetto, Susan Mérillat, Christina Röcke, Mike Martin, Abraham Bernstein

TL;DR

This work tackles the data scarcity of individual-specific brain MRI segmentations by introducing CSegSynth, a conditional deep generative model that synthesizes 3D WM, GM, and CSF segmentations from easily obtainable demographic, interview, and cognitive features. The approach combines unconditional pre-training on large MRI datasets (AOMIC ID1000) with conditional fine-tuning on CamCAN, using four architectures (VAE, GAN, LDM, and $\alpha$-GAN) and a dedicated conditional model (CSegSynth) to generate individual-specific segmentations. Empirical results show that CSegSynth achieves state-of-the-art quality across distributional similarity metrics (MMD, 2D/3D-FID, $|\Delta SSIM|$) and substantially improves volume-prediction accuracy for $WM$, $GM$, and $CSF$ with MAEs of $36.44$, $29.20$, and $35.51$ mL, respectively, outperforming conventional regression baselines. The model also enables novel neuroscience applications, including gradient-based feature importance analyses and hypothetical segmentation generation for aging and lifestyle scenarios, while acknowledging limitations such as limited training data and challenges in validating hypothetical trajectories. Overall, CSegSynth offers a principled, scalable path to individual-specific brain structure studies under data privacy and scarcity constraints, with publicly available code and data resources cited.

Abstract

To the best of our knowledge, all existing methods that can generate synthetic brain magnetic resonance imaging (MRI) scans for a specific individual require detailed structural or volumetric information about the individual's brain. However, such brain information is often scarce, expensive, and difficult to obtain. In this paper, we propose the first approach capable of generating synthetic brain MRI segmentations -- specifically, 3D white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) segmentations -- for individuals using their easily obtainable and often readily available demographic, interview, and cognitive test information. Our approach features a novel deep generative model, CSegSynth, which outperforms existing prominent generative models, including conditional variational autoencoder (C-VAE), conditional generative adversarial network (C-GAN), and conditional latent diffusion model (C-LDM). We demonstrate the high quality of our synthetic segmentations through extensive evaluations. Also, in assessing the effectiveness of the individual-specific generation, we achieve superior volume prediction, with mean absolute errors of only 36.44mL, 29.20mL, and 35.51mL between the ground-truth WM, GM, and CSF volumes of test individuals and those volumes predicted based on generated individual-specific segmentations, respectively.

Deep Generative Model-Based Generation of Synthetic Individual-Specific Brain MRI Segmentations

TL;DR

This work tackles the data scarcity of individual-specific brain MRI segmentations by introducing CSegSynth, a conditional deep generative model that synthesizes 3D WM, GM, and CSF segmentations from easily obtainable demographic, interview, and cognitive features. The approach combines unconditional pre-training on large MRI datasets (AOMIC ID1000) with conditional fine-tuning on CamCAN, using four architectures (VAE, GAN, LDM, and -GAN) and a dedicated conditional model (CSegSynth) to generate individual-specific segmentations. Empirical results show that CSegSynth achieves state-of-the-art quality across distributional similarity metrics (MMD, 2D/3D-FID, ) and substantially improves volume-prediction accuracy for , , and with MAEs of , , and mL, respectively, outperforming conventional regression baselines. The model also enables novel neuroscience applications, including gradient-based feature importance analyses and hypothetical segmentation generation for aging and lifestyle scenarios, while acknowledging limitations such as limited training data and challenges in validating hypothetical trajectories. Overall, CSegSynth offers a principled, scalable path to individual-specific brain structure studies under data privacy and scarcity constraints, with publicly available code and data resources cited.

Abstract

To the best of our knowledge, all existing methods that can generate synthetic brain magnetic resonance imaging (MRI) scans for a specific individual require detailed structural or volumetric information about the individual's brain. However, such brain information is often scarce, expensive, and difficult to obtain. In this paper, we propose the first approach capable of generating synthetic brain MRI segmentations -- specifically, 3D white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) segmentations -- for individuals using their easily obtainable and often readily available demographic, interview, and cognitive test information. Our approach features a novel deep generative model, CSegSynth, which outperforms existing prominent generative models, including conditional variational autoencoder (C-VAE), conditional generative adversarial network (C-GAN), and conditional latent diffusion model (C-LDM). We demonstrate the high quality of our synthetic segmentations through extensive evaluations. Also, in assessing the effectiveness of the individual-specific generation, we achieve superior volume prediction, with mean absolute errors of only 36.44mL, 29.20mL, and 35.51mL between the ground-truth WM, GM, and CSF volumes of test individuals and those volumes predicted based on generated individual-specific segmentations, respectively.

Paper Structure

This paper contains 21 sections, 15 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: (a) An overview of how our proposed deep generative model conditionally generates individual-specific MRI segmentations. Given an individual with easily obtainable features (original or hypothetically modified for studies with control variables), we aim to develop a deep generative model that can generate synthetic 3D MRI segmentations specific to this individual. (b) An overview of the proposed approach for training the deep generative model. The training includes two steps: unconditional pre-training based on AOMIC ID1000 snoek2021amsterdam and conditional fine-tuning based on a subset of CamCAN shafto2014cambridge. The newly proposed CSegSynth model and its corresponding pre-trained $\alpha$-GAN model are highlighed with stars.
  • Figure 2: A comparison of real example segmentations and synthetic segmentations generated by our trained conditional models---C-VAE, C-GAN, C-LDM, and CSegSynth. We present center-cut 2D slices in the sagittal, axial, and coronal views for each 3D segmentation. Please note that, due to CamCAN's data restrictions, the real examples are sourced from a different public dataset (AOMIC) and are included solely to illustrate the general image quality of real segmentations.
  • Figure 3: Predicted trajectories of future WM and GM volumes for a test individual based on future segmentations generated by CSegSynth. We consider two scenarios: "ideal," where all features remain unchanged except for age, and "regressed," where the features are adjusted based on age-related regression. For each trajectory, we also plot a linear regression line with a 95% confidence interval.
  • Figure 4: An overview of the model structures of VAE, C-VAE, GAN, C-GAN, LDM, and C-LDM. The triplet margin loss of C-GAN is not shown for clarity. Please refer to \ref{['subsubsec:cgan']} for details.
  • Figure 5: An overview of the encoder and decoder structures of the VAE model. The 3D convolutional (3D Conv) networks are labeled with kernel size (e.g., $3\times5\times5$) and input/output channels (e.g., $4/16$). The ConvNeXt DBLP:conf/cvpr/0003MWFDX22 blocks are labeled with the repetition times (e.g., $\times 3$). The 3D transposed convolutional (3D TransConv) network is labeled with the kernel size $10\times16\times16$. The upsampling layers are labeled with the upsampling factor $2\times$. Layer Norm and FFN refer to layer normalization ba2016layer and feedforward neural networks, respectively.
  • ...and 1 more figures