Table of Contents
Fetching ...

DIST-CLIP: Arbitrary Metadata and Image Guided MRI Harmonization via Disentangled Anatomy-Contrast Representations

Mehmet Yigit Avci, Pedro Borges, Virginia Fernandez, Paul Wright, Mehmet Yigitsoy, Sebastien Ourselin, Jorge Cardoso

TL;DR

MRI data heterogeneity due to scanner and protocol variability hampers clinical AI generalization. DIST-CLIP presents a unified MRI harmonization framework that disentangles anatomy from contrast and supports guidance from target images or DICOM metadata through MR-CLIP embeddings. An Adaptive Style Transfer module enables precise, controllable contrast injection while preserving anatomy, achieving strong cross-contrast performance and zero-shot generalization to out-of-distribution cohorts like OASIS-3. Across large real-world datasets, DIST-CLIP demonstrates high reconstruction fidelity and anatomical fidelity, with code and weights to be released upon publication.

Abstract

Deep learning holds immense promise for transforming medical image analysis, yet its clinical generalization remains profoundly limited. A major barrier is data heterogeneity. This is particularly true in Magnetic Resonance Imaging, where scanner hardware differences, diverse acquisition protocols, and varying sequence parameters introduce substantial domain shifts that obscure underlying biological signals. Data harmonization methods aim to reduce these instrumental and acquisition variability, but existing approaches remain insufficient. When applied to imaging data, image-based harmonization approaches are often restricted by the need for target images, while existing text-guided methods rely on simplistic labels that fail to capture complex acquisition details or are typically restricted to datasets with limited variability, failing to capture the heterogeneity of real-world clinical environments. To address these limitations, we propose DIST-CLIP (Disentangled Style Transfer with CLIP Guidance), a unified framework for MRI harmonization that flexibly uses either target images or DICOM metadata for guidance. Our framework explicitly disentangles anatomical content from image contrast, with the contrast representations being extracted using pre-trained CLIP encoders. These contrast embeddings are then integrated into the anatomical content via a novel Adaptive Style Transfer module. We trained and evaluated DIST-CLIP on diverse real-world clinical datasets, and showed significant improvements in performance when compared against state-of-the-art methods in both style translation fidelity and anatomical preservation, offering a flexible solution for style transfer and standardizing MRI data. Our code and weights will be made publicly available upon publication.

DIST-CLIP: Arbitrary Metadata and Image Guided MRI Harmonization via Disentangled Anatomy-Contrast Representations

TL;DR

MRI data heterogeneity due to scanner and protocol variability hampers clinical AI generalization. DIST-CLIP presents a unified MRI harmonization framework that disentangles anatomy from contrast and supports guidance from target images or DICOM metadata through MR-CLIP embeddings. An Adaptive Style Transfer module enables precise, controllable contrast injection while preserving anatomy, achieving strong cross-contrast performance and zero-shot generalization to out-of-distribution cohorts like OASIS-3. Across large real-world datasets, DIST-CLIP demonstrates high reconstruction fidelity and anatomical fidelity, with code and weights to be released upon publication.

Abstract

Deep learning holds immense promise for transforming medical image analysis, yet its clinical generalization remains profoundly limited. A major barrier is data heterogeneity. This is particularly true in Magnetic Resonance Imaging, where scanner hardware differences, diverse acquisition protocols, and varying sequence parameters introduce substantial domain shifts that obscure underlying biological signals. Data harmonization methods aim to reduce these instrumental and acquisition variability, but existing approaches remain insufficient. When applied to imaging data, image-based harmonization approaches are often restricted by the need for target images, while existing text-guided methods rely on simplistic labels that fail to capture complex acquisition details or are typically restricted to datasets with limited variability, failing to capture the heterogeneity of real-world clinical environments. To address these limitations, we propose DIST-CLIP (Disentangled Style Transfer with CLIP Guidance), a unified framework for MRI harmonization that flexibly uses either target images or DICOM metadata for guidance. Our framework explicitly disentangles anatomical content from image contrast, with the contrast representations being extracted using pre-trained CLIP encoders. These contrast embeddings are then integrated into the anatomical content via a novel Adaptive Style Transfer module. We trained and evaluated DIST-CLIP on diverse real-world clinical datasets, and showed significant improvements in performance when compared against state-of-the-art methods in both style translation fidelity and anatomical preservation, offering a flexible solution for style transfer and standardizing MRI data. Our code and weights will be made publicly available upon publication.

Paper Structure

This paper contains 11 sections, 5 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Overview of the DIST-CLIP framework. (A) Overall architecture: source image is processed by the Anatomy Mapper to extract a disentangled anatomical representation ($\beta_s$). In parallel, a style embedding ($\theta_i$ or $\theta_m$) is derived from either a target image or metadata using pre-trained CLIP encoders. The Style Fusion Decoder (SFD) integrates these anatomy and style representations to synthesize the final image with the desired appearance. (B) Detailed structure of the SFD, which adaptively fuses anatomical and style features through Adaptive Style Transfer (AST) blocks. (C) Loss suite used for training, enforcing anatomical preservation, reconstruction fidelity and style consistency.
  • Figure 2: Quantitative evaluation of cross-contrast harmonization. Heatmaps show PSNR (top row) and SSIM (bottom row) for all present bi-directional translation tasks across T1w, T2w, PDw, and FLAIR MRI sequences. Rows represent the source contrast, and columns represent the target contrast, comparing DIST-CLIP (Image/I and Text/T guided) against HACA3 and TUMSyn.
  • Figure 3: Qualitative assessment of cross-contrast harmonization. Outputs from baselines (HACA3, TUMSyn) are shown alongside the DIST-CLIP framework (text-guided /T and image-guided /I). Inset PSNR (dB) and SSIM scores confirm DIST-CLIP's high structural and visual fidelity.
  • Figure 4: Anatomical Representations. The top row displays the source MR Images ($\text{T1w}, \text{T2w}, \text{FLAIR}, \text{T2*w}$, respectively), and the bottom row shows their corresponding contrast-invariant anatomical ($\beta$) representations.
  • Figure 5: Qualitative and quantitative results on the OOD (OASIS-3) dataset. (A) Visual comparison of harmonization performances. (B) Quantitative analysis of bidirectional translation ($n=34$) measured by PSNR and SSIM, demonstrating that DIST-CLIP performs on par with or better than state-of-the-art methods, even without being trained on this dataset.