Quantitative Metrics for Benchmarking Medical Image Harmonization

Abhijeet Parida; Zhifan Jiang; Roger J. Packer; Robert A. Avery; Syed M. Anwar; Marius G. Linguraru

Quantitative Metrics for Benchmarking Medical Image Harmonization

Abhijeet Parida, Zhifan Jiang, Roger J. Packer, Robert A. Avery, Syed M. Anwar, Marius G. Linguraru

TL;DR

The study tackles the challenge of benchmarking medical image harmonization in MRI where ground-truth data are scarce. It introduces three metrics: two intensity-harmonization measures based on Wasserstein distance $WD$ with normalized forms $nWD(i,p)$ and $nWD(t,p)$, and one anatomy-preservation metric $AP(i,p)$ derived from Freesurfer segmentations, all designed to operate without ground truth. The authors validate these metrics against established image-quality metrics on a traveling phantom dataset and examine behavior on real-world multi-site pediatric MRI data using GAN-based neural style transfer (NST) harmonization. Results show that the proposed metrics correlate with traditional metrics and provide interpretable guidance on harmonization quality and anatomical preservation, supporting their adoption as a standardized benchmarking suite for medical image harmonization across scanners and protocols.

Abstract

Image harmonization is an important preprocessing strategy to address domain shifts arising from data acquired using different machines and scanning protocols in medical imaging. However, benchmarking the effectiveness of harmonization techniques has been a challenge due to the lack of widely available standardized datasets with ground truths. In this context, we propose three metrics: two intensity harmonization metrics and one anatomy preservation metric for medical images during harmonization, where no ground truths are required. Through extensive studies on a dataset with available harmonization ground truth, we demonstrate that our metrics are correlated with established image quality assessment metrics. We show how these novel metrics may be applied to real-world scenarios where no harmonization ground truth exists. Additionally, we provide insights into different interpretations of the metric values, shedding light on their significance in the context of the harmonization process. As a result of our findings, we advocate for the adoption of these quantitative harmonization metrics as a standard for benchmarking the performance of image harmonization techniques.

Quantitative Metrics for Benchmarking Medical Image Harmonization

TL;DR

with normalized forms

and

, and one anatomy-preservation metric

derived from Freesurfer segmentations, all designed to operate without ground truth. The authors validate these metrics against established image-quality metrics on a traveling phantom dataset and examine behavior on real-world multi-site pediatric MRI data using GAN-based neural style transfer (NST) harmonization. Results show that the proposed metrics correlate with traditional metrics and provide interpretable guidance on harmonization quality and anatomical preservation, supporting their adoption as a standardized benchmarking suite for medical image harmonization across scanners and protocols.

Abstract

Paper Structure (12 sections, 2 equations, 3 figures, 4 tables)

This paper contains 12 sections, 2 equations, 3 figures, 4 tables.

Introduction
Evaluation of Image Harmonization
Intensity Harmonization
Anatomy Preservation
Experimental Setup
Ground Truth Evaluation
Real-world Scenario
Results & Interpretations
Ground Truth Evaluation and Comparison with Other Metrics
Performance on Real World Data
Discussions
Conclusion

Figures (3)

Figure 1: Image Harmonization: Schematic of the image intensity histogram manifold and how the various metrics (WD(i,p), WD(t, p), and WD(i, t)), with the input image(i), target(t), and the predicted image(p).
Figure 2: Anatomy Preservation: Schematic of AP(i,p) calculation for brain image harmonization using Freesurfer v7 (FS7).
Figure 3: Qualitative results show harmonization of images from Site A(input) to B(target) and the predicted harmonized image.

Quantitative Metrics for Benchmarking Medical Image Harmonization

TL;DR

Abstract

Quantitative Metrics for Benchmarking Medical Image Harmonization

Authors

TL;DR

Abstract

Table of Contents

Figures (3)