ComBAT Harmonization for diffusion MRI: Challenges and Best Practices
Pierre-Marc Jodoin, Manon Edde, Gabriel Girard, Félix Dumais, Guillaume Theaud, Matthieu Dumont, Jean-Christophe Houde, Yoan David, Maxime Descoteaux
TL;DR
ComBAT harmonization for diffusion MRI is analyzed with focus on its linear data-generation model $y_{ijv}=\alpha_v+\mathbf{x}_{ij}^T\bm{\beta}_v+\gamma_{iv}+\delta_{iv}\varepsilon_{ijv}$ and the critical assumption that $\bm{\beta}_v$ is identical across sites. The paper shows that violations, especially a site-dependent slope induced by a multiplicative factor $S_i$ or biased variance $\delta_{iv}^2$, can lead to poor harmonization. To address this, it introduces Pairwise-ComBAT, which harmonizes each moving site to a fixed reference site and uses a goodness-of-fit metric based on the Bhattacharyya distance to quantify overlap. Through experiments on CamCAN, Modified-CamCAN, ADNI, and NIMH, it derives practical recommendations for data inspection, covariate inclusion, sample size, age range, sex balance, and handling pathological populations, to improve reproducibility and clinical applicability.
Abstract
Over the years, ComBAT has become the standard method for harmonizing MRI-derived measurements, with its ability to compensate for site-related additive and multiplicative biases while preserving biological variability. However, ComBAT relies on a set of assumptions that, when violated, can result in flawed harmonization. In this paper, we thoroughly review ComBAT's mathematical foundation, outlining these assumptions, and exploring their implications for the demographic composition necessary for optimal results. Through a series of experiments involving a slightly modified version of ComBAT called Pairwise-ComBAT tailored for normative modeling applications, we assess the impact of various population characteristics, including population size, age distribution, the absence of certain covariates, and the magnitude of additive and multiplicative factors. Based on these experiments, we present five essential recommendations that should be carefully considered to enhance consistency and supporting reproducibility, two essential factors for open science, collaborative research, and real-life clinical deployment.
