Causal discovery with endogenous context variables
Wiebke Günther, Oana-Iuliana Popescu, Martin Rabel, Urmi Ninad, Andreas Gerhardus, Jakob Runge
TL;DR
The authors address causal discovery when context variables that modulate causal mechanisms may be endogenous, complicating traditional methods. They introduce an adaptive constraint-based algorithm that tests independence either on context-specific data or on pooled data to recover per-context and union graphs, tying results to a formal SCM framework with descriptive, physical, and counterfactual graphs. Under suitable sufficiency and faithfulness assumptions, the method is sound and yields interpretable context-specific causal changes while mitigating selection bias. Simulation results demonstrate improved finite-sample performance over masking or pooling baselines and reveal the method’s limitations with large cycles and uncertain context-system links. The work enables robust inference of context-dependent mechanisms with endogenous contexts and points to extensions for time-series data and richer orientation rules.
Abstract
Causal systems often exhibit variations of the underlying causal mechanisms between the variables of the system. Often, these changes are driven by different environments or internal states in which the system operates, and we refer to context variables as those variables that indicate this change in causal mechanisms. An example are the causal relations in soil moisture-temperature interactions and their dependence on soil moisture regimes: Dry soil triggers a dependence of soil moisture on latent heat, while environments with wet soil do not feature such a feedback, making it a context-specific property. Crucially, a regime or context variable such as soil moisture need not be exogenous and can be influenced by the dynamical system variables - precipitation can make a dry soil wet - leading to joint systems with endogenous context variables. In this work we investigate the assumptions for constraint-based causal discovery of context-specific information in systems with endogenous context variables. We show that naive approaches such as learning different regime graphs on masked data, or pooling all data, can lead to uninformative results. We propose an adaptive constraint-based discovery algorithm and give a detailed discussion on the connection to structural causal models, including sufficiency assumptions, which allow to prove the soundness of our algorithm and to interpret the results causally. Numerical experiments demonstrate the performance of the proposed method over alternative baselines, but they also unveil current limitations of our method.
