Robust-ComBat: Mitigating Outlier Effects in Diffusion MRI Data Harmonization

Yoan David; Pierre-Marc Jodoin; Alzheimer's Disease Neuroimaging Initiative; The TRACK-TBI Investigators

Robust-ComBat: Mitigating Outlier Effects in Diffusion MRI Data Harmonization

Yoan David, Pierre-Marc Jodoin, Alzheimer's Disease Neuroimaging Initiative, The TRACK-TBI Investigators

Abstract

Harmonization methods such as ComBat and its variants are widely used to mitigate diffusion MRI (dMRI) site-specific biases. However, ComBat assumes that subject distributions exhibit a Gaussian profile. In practice, patients with neurological disorders often present diffusion metrics that deviate markedly from those of healthy controls, introducing pathological outliers that distort site-effect estimation. This problem is particularly challenging in clinical practice as most patients undergoing brain imaging have an underlying and yet undiagnosed condition, making it difficult to exclude them from harmonization cohorts, as their scans were precisely prescribed to establish a diagnosis. In this paper, we show that harmonizing data to a normative reference population with ComBat while including pathological cases induces significant distortions. Across 7 neurological conditions, we evaluated 10 outlier rejection methods with 4 ComBat variants over a wide range of scenarios, revealing that many filtering strategies fail in the presence of pathology. In contrast, a simple MLP provides robust outlier compensation enabling reliable harmonization while preserving disease-related signal. Experiments on both control and real multi-site cohorts, comprising up to 80% of subjects with neurological disorders, demonstrate that Robust-ComBat consistently outperforms conventional statistical baselines with lower harmonization error across all ComBat variants.

Robust-ComBat: Mitigating Outlier Effects in Diffusion MRI Data Harmonization

Abstract

Paper Structure (9 sections, 19 equations, 14 figures, 2 tables)

This paper contains 9 sections, 19 equations, 14 figures, 2 tables.

Z-score (ZS) Iglewicz1993Outliers
Interquartile Range (IQR) Tukey1977
Median Absolute Deviation (MAD) Leys2013
Rousseeuw-Croux Estimators (Sn, Qn) Rousseeuw1993
Mean-Median Shift (MMS)
Variance Symmetry (VS)
Global Z-score (G_ZS)
Global MAD (G_MAD)
Figures S1 to S8 complement Figure 4 in the paper.

Figures (14)

Figure 1: Illustration of the effect of mixing healthy controls (HC, green shading) and pathological subjects (TBI, red dots) from a given site on ComBat harmonization of AFD in the left IFOF. Prior to harmonization, the HC distribution from this site shows a marked deviation from the normative reference (gray). When HC and TBI subjects are blindly included in the harmonization process using a site composed of 50% TBI patients (top right), both populations are artificially compressed toward the normative distribution. In contrast, when pathological outliers are properly filtered out from the estimation of site effects, harmonization is near optimal: HC distributions align with the normative reference, while TBI subjects remain consistently shifted below it.
Figure 2: Illustration of pathology-driven shifts in dMRI metric distributions. [Top] AD subjects have a 1.02 SD increase in FW within the anterior commissure. [Bottom] TBI patients show a 1.31 SD decrease in AFD within the left IFOF.
Figure 3: (1) All available sites are first harmonized toward the CamCAN reference to construct a unified harmonized dataset. (2) The harmonized data are then split into two independent subsets, one used to train the MLP-based outlier detector and the other used to evaluate harmonization performance. (3) Control sites are generated by sampling subjects into multiple sites with varying proportions of diseased patients.
Figure 4: (a) Global harmonization performance. (I) Mean STD-MAE averaged across all diffusion metrics and bundles. (II) STD-MAE computed for FA across bundles. (III) STD-MAE computed for MD across bundles. As the proportion of pathological subjects increases, classical statistical approaches exhibit increasing error, whereas MLP-based filtering remains more stable and closer to the HC baseline. (b) Illustration of outlier-handling strategies prior to harmonization for the FW metric in the right uncinate fasciculus (UF right) within a site containing 50% pathological subjects, shown as a function of age. Raw data, no filtering, HC-only filtering, MLP-based filtering, and statistical approaches (G_ZS, MAD, MMS) are displayed.
Figure 5: STD_MAE measured at different disease ratios (50%, 70%, and 80%) for MLP, HC, and NO_FILTERING across increasing total numbers of patients (20-60).
...and 9 more figures

Robust-ComBat: Mitigating Outlier Effects in Diffusion MRI Data Harmonization

Abstract

Robust-ComBat: Mitigating Outlier Effects in Diffusion MRI Data Harmonization

Authors

Abstract

Table of Contents

Figures (14)