Sample Complexity of Causal Identification with Temporal Heterogeneity
Ameya Rathod, Sujay Belsare, Salvik Krishna Nautiyal, Dhruv Laad, Ponnurangam Kumaraguru
TL;DR
The paper addresses learning causal structure from non-stationary observational data by jointly leveraging temporal heteroskedasticity and multi-environment shifts. It extends identifiability analysis to heavy-tailed noise (multivariate Student's $t$) and derives finite-sample guarantees, showing their identifiability conditions mirror the Gaussian case but with a tail-dependent sample complexity penalty. A key theoretical result is that a temporal window must satisfy $T \ge \lceil d/r \rceil$ under rank-deficient heterogeneity to achieve identifiability with second-order information, and the information-theoretic lower bounds reveal an intrinsic penalty of $1 + \frac{3}{\nu-4}$ on sample efficiency. Empirical results on synthetic data corroborate the theory, illustrating eigenvalue crowding as a practical bottleneck and validating the heavy-tail penalty in covariance-based recovery. Overall, the work shifts focus from mere identifiability to practical recoverability, providing a principled framework for covariance-based causal discovery in non-stationary, heavy-tailed systems.
Abstract
Recovering a unique causal graph from observational data is an ill-posed problem because multiple generating mechanisms can lead to the same observational distribution. This problem becomes solvable only by exploiting specific structural or distributional assumptions. While recent work has separately utilized time-series dynamics or multi-environment heterogeneity to constrain this problem, we integrate both as complementary sources of heterogeneity. This integration yields unified necessary identifiability conditions and enables a rigorous analysis of the statistical limits of recovery under thin versus heavy-tailed noise. In particular, temporal structure is shown to effectively substitute for missing environmental diversity, possibly achieving identifiability even under insufficient heterogeneity. Extending this analysis to heavy-tailed (Student's t) distributions, we demonstrate that while geometric identifiability conditions remain invariant, the sample complexity diverges significantly from the Gaussian baseline. Explicit information-theoretic bounds quantify this cost of robustness, establishing the fundamental limits of covariance-based causal graph recovery methods in realistic non-stationary systems. This work shifts the focus from whether causal structure is identifiable to whether it is statistically recoverable in practice.
