On the Complexity of Identification in Linear Structural Causal Models
Julian Dörfler, Benito van der Zander, Markus Bläser, Maciej Liskiewicz
TL;DR
This work studies the computational complexity of identifying parameters in linear structural causal models (linear SCMs). It presents a sound, complete polynomial-space algorithm for generic identifiability, yielding an exponential-time algorithm that improves on previous double-exponential Gröbner-basis approaches, and shows that generic identifiability lies in the $\exists\forall\mathbb{R}$ (and $\forall\exists\mathbb{R}$) regime. It further proves that numerical identifiability is $\forall\mathbb{R}$-hard (in particular $co\mathsf{NP}$-hard) while remaining decidable in polynomial space, and extends the results to cyclic graphs. The paper thus sharpens the complexity landscape for identifiability in linear SCMs, providing both upper bounds via real-algebraic techniques and hardness results, with potential impact on algorithmic causal inference and related symbolic approaches.
Abstract
Learning the unknown causal parameters of a linear structural causal model is a fundamental task in causal analysis. The task, known as the problem of identification, asks to estimate the parameters of the model from a combination of assumptions on the graphical structure of the model and observational data, represented as a non-causal covariance matrix. In this paper, we give a new sound and complete algorithm for generic identification which runs in polynomial space. By standard simulation results, this algorithm has exponential running time which vastly improves the state-of-the-art double exponential time method using a Gröbner basis approach. The paper also presents evidence that parameter identification is computationally hard in general. In particular, we prove, that the task asking whether, for a given feasible correlation matrix, there are exactly one or two or more parameter sets explaining the observed matrix, is hard for $\forall R$, the co-class of the existential theory of the reals. In particular, this problem is $coNP$-hard. To our best knowledge, this is the first hardness result for some notion of identifiability.
