Statistical quantification of confounding bias in predictive modelling
Tamas Spisak
TL;DR
The paper tackles confounding bias in predictive modelling by introducing two tests based on conditional permutation testing: the partial confounder test and the full confounder test. These tests, implemented via GAM or multinomial logistic regression to model conditional distributions, evaluate whether model predictions are confounder-driven given the outcome, or whether the outcome is confounder-driven given the predictions, without re-fitting the model and with robust Type I error control under non-normal and non-linear conditions. They use an $R^2$-based test statistic and a parallel-pairwise MCMC CPT framework to generate valid null distributions, and are demonstrated on simulated data and real neuroimaging datasets (HCP and ABIDE) to identify and quantify confounding biases such as age, acquisition batch, center, and motion, and to benchmark mitigation approaches. The mlconfound package enables practical application, providing a rigorous, scalable tool to improve generalizability and neurobiological validity in predictive biomarkers derived from functional connectivity data.
Abstract
The lack of non-parametric statistical tests for confounding bias significantly hampers the development of robust, valid and generalizable predictive models in many fields of research. Here I propose the partial and full confounder tests, which, for a given confounder variable, probe the null hypotheses of unconfounded and fully confounded models, respectively. The tests provide a strict control for Type I errors and high statistical power, even for non-normally and non-linearly dependent predictions, often seen in machine learning. Applying the proposed tests on models trained on functional brain connectivity data from the Human Connectome Project and the Autism Brain Imaging Data Exchange dataset reveals confounders that were previously unreported or found to be hard to correct for with state-of-the-art confound mitigation approaches. The tests, implemented in the package mlconfound (https://mlconfound.readthedocs.io), can aid the assessment and improvement of the generalizability and neurobiological validity of predictive models and, thereby, foster the development of clinically useful machine learning biomarkers.
