Disentangled Diffusion Autoencoder for Harmonization of Multi-site Neuroimaging Data

Ayodeji Ijishakin; Ana Lawry Aguila; Elizabeth Levitis; Ahmed Abdulaal; Andre Altmann; James Cole

Disentangled Diffusion Autoencoder for Harmonization of Multi-site Neuroimaging Data

Ayodeji Ijishakin, Ana Lawry Aguila, Elizabeth Levitis, Ahmed Abdulaal, Andre Altmann, James Cole

TL;DR

This work addresses the challenge of multi-site neuroimaging harmonization by introducing the Disentangled Diffusion Autoencoder (DDAE), a diffusion-model framework that disentangles known covariates from unknown biological variance via separate latent representations. By conditioning the reverse diffusion process on both latents, DDAE can generate site-adjusted MR images that preserve biological variability while removing site-specific bias. In experiments across 7 sites on 4120 2D MR slices, DDAE achieves superior image quality (lowest FID), effective site removal, and robust preservation of age and sex information, while maintaining within-site variability, outperforming ComBat, a cVAE, and a Style-Encoding GAN. The approach demonstrates the potential of diffusion-based harmonization in neuroimaging and suggests extensions to 3D and vertex-wise data for broader applicability.

Abstract

Combining neuroimaging datasets from multiple sites and scanners can help increase statistical power and thus provide greater insight into subtle neuroanatomical effects. However, site-specific effects pose a challenge by potentially obscuring the biological signal and introducing unwanted variance. Existing harmonization techniques, which use statistical models to remove such effects, have been shown to incompletely remove site effects while also failing to preserve biological variability. More recently, generative models using GANs or autoencoder-based approaches, have been proposed for site adjustment. However, such methods are known for instability during training or blurry image generation. In recent years, diffusion models have become increasingly popular for their ability to generate high-quality synthetic images. In this work, we introduce the disentangled diffusion autoencoder (DDAE), a novel diffusion model designed for controlling specific aspects of an image. We apply the DDAE to the task of harmonizing MR images by generating high-quality site-adjusted images that preserve biological variability. We use data from 7 different sites and demonstrate the DDAE's superiority in generating high-resolution, harmonized 2D MR images over previous approaches. As far as we are aware, this work marks the first diffusion-based model for site adjustment of neuroimaging data.

Disentangled Diffusion Autoencoder for Harmonization of Multi-site Neuroimaging Data

TL;DR

Abstract

Paper Structure (16 sections, 5 equations, 7 figures, 3 tables)

This paper contains 16 sections, 5 equations, 7 figures, 3 tables.

Introduction and related work
Background
Method
Disentangled Diffusion Autoencoders
Training Objective
Experiments
Datasets
Pre-processing
Training setup and Benchmarking
Results
Generating high-quality images.
Removing site effects.
Predicting known biological covariates.
Preserving within-site variability.
Conclusions and further work
...and 1 more sections

Figures (7)

Figure 1: The disentangled diffusion autoencoder. Our image $\mathbf{x}_{0}$ goes through the unknown-variance encoder $s_{\phi}(\mathbf{x}_{0})$ to produce $\mathbf{z}_{\upsilon}$. Next, the known-variance encoder $f_{\psi}(\boldsymbol{c})$ produces $\mathbf{z}_{\kappa}$ conditional on age, sex and site. Our forward process $q(\mathbf{x}_{t}| \mathbf{x}_{0})$ noises our data $\mathbf{x}_{0}$ to produce $\mathbf{x}_{t}$. After which our reverse process $p_{\theta}(\mathbf{x}_{t}|\mathbf{z}_{\kappa}$, $\mathbf{z}_{\upsilon})$ then recovers our data conditional on $\mathbf{z}_{\kappa}$ and $\mathbf{z}_{\upsilon}$.
Figure 2: Data pre-processing steps. The ANTS package was used to conduct affine registrations of the images to the MNI 152 brain template, they were then resampled to 130 × 130 × 130 resolution, Simple ITK was used to perform n4 bias field correction and HD-BET was used to skull strip the images. Following pre-processing, 2D medial axial slices were taken from the 3D volumes, which had their pixel values normalised to be between 0 and 1, after being resized to 128 × 128 resolution. The 2D images were then used to train the harmonization models.
Figure 3: Example Reconstructions from our Disentengled diffusion autoencoder (DDAE) compared to a cVAE. The first row shows original images from our dataset, the second row shows reconstructions from our model and the third row shows reconstructions from a cVAE.
Figure 4: UMAP embeddings of the joint latent space of $\mathbf{z}_{\kappa}$ and $\mathbf{z}_{\upsilon}$. The image on the left shows the latent space when we harmonise the images by setting their site to the same dataset (IXI). The right image shows the disentangled latent space when we use the actual site labels. In both instances, we control for sex and age.
Figure 5: Example images following harmonization. The first and third columns display images from two different datasets, and the second and fourth rows display the images after being harmonized. It can be seen that the images have their pixel intensity values increased accordingly whilst maintaining their original neuroanatomy.
...and 2 more figures

Disentangled Diffusion Autoencoder for Harmonization of Multi-site Neuroimaging Data

TL;DR

Abstract

Disentangled Diffusion Autoencoder for Harmonization of Multi-site Neuroimaging Data

Authors

TL;DR

Abstract

Table of Contents

Figures (7)