Efficient Identification of Direct Causal Parents via Invariance and Minimum Error Testing
Minh Nguyen, Mert R. Sabuncu
TL;DR
This work addresses scalable local causal discovery under distribution shifts by improving invariance-based methods. It introduces MMSE-ICP and fastICP, two algorithms that leverage a minimum-mean-squared-error (MMSE) inequality to identify direct causal parents with substantially fewer tests than classic ICP, while offering identifiability guarantees under plausible assumptions. Through extensive simulations and a large-scale gene expression study, the methods outperform baselines and achieve state-of-the-art results, demonstrating both accuracy and scalability. The work paves the way for robust causal variable identification in high-dimensional, partially perturbed systems and has potential implications for resilient representation learning and domain-general ML models.
Abstract
Invariant causal prediction (ICP) is a popular technique for finding causal parents (direct causes) of a target via exploiting distribution shifts and invariance testing (Peters et al., 2016). However, since ICP needs to run an exponential number of tests and fails to identify parents when distribution shifts only affect a few variables, applying ICP to practical large scale problems is challenging. We propose MMSE-ICP and fastICP, two approaches which employ an error inequality to address the identifiability problem of ICP. The inequality states that the minimum prediction error of the predictor using causal parents is the smallest among all predictors which do not use descendants. fastICP is an efficient approximation tailored for large problems as it exploits the inequality and a heuristic to run fewer tests. MMSE-ICP and fastICP not only outperform competitive baselines in many simulations but also achieve state-of-the-art result on a large scale real data benchmark.
