Unpaired Volumetric Harmonization of Brain MRI with Conditional Latent Diffusion
Mengqi Wu, Minhui Yu, Shuaiming Jing, Pew-Thian Yap, Zhengwu Zhang, Mingxia Liu
TL;DR
This work tackles the challenge of site-related variations in multi-site brain MRI by proposing HCLD, a unpaired, 3D harmonization framework that operates in a latent space. A pre-trained 3D autoencoder maps MRIs to a 4D latent representation, while a conditional latent diffusion model translates source latent maps to the target domain, conditioned on target style, enabling efficient volume-level harmonization without paired data. The method employs a two-stage training scheme, latent map fusion with AdaIN/IN, and content/style losses (including a Gram-based style loss), with DDIM-based inference for stability and speed. Across three datasets and three tasks, HCLD outperforms state-of-the-art methods in histogram alignment, site-age separation, and voxel-level fidelity, while maintaining anatomical integrity and facilitating downstream analyses. The approach offers a scalable, generalizable solution for harmonizing diverse MRI datasets, with potential extensions to additional sequences and clinically informed conditioning.
Abstract
Multi-site structural MRI is increasingly used in neuroimaging studies to diversify subject cohorts. However, combining MR images acquired from various sites/centers may introduce site-related non-biological variations. Retrospective image harmonization helps address this issue, but current methods usually perform harmonization on pre-extracted hand-crafted radiomic features, limiting downstream applicability. Several image-level approaches focus on 2D slices, disregarding inherent volumetric information, leading to suboptimal outcomes. To this end, we propose a novel 3D MRI Harmonization framework through Conditional Latent Diffusion (HCLD) by explicitly considering image style and brain anatomy. It comprises a generalizable 3D autoencoder that encodes and decodes MRIs through a 4D latent space, and a conditional latent diffusion model that learns the latent distribution and generates harmonized MRIs with anatomical information from source MRIs while conditioned on target image style. This enables efficient volume-level MRI harmonization through latent style translation, without requiring paired images from target and source domains during training. The HCLD is trained and evaluated on 4,158 T1-weighted brain MRIs from three datasets in three tasks, assessing its ability to remove site-related variations while retaining essential biological features. Qualitative and quantitative experiments suggest the effectiveness of HCLD over several state-of-the-arts
