Table of Contents
Fetching ...

Sample Complexity of Causal Identification with Temporal Heterogeneity

Ameya Rathod, Sujay Belsare, Salvik Krishna Nautiyal, Dhruv Laad, Ponnurangam Kumaraguru

TL;DR

The paper addresses learning causal structure from non-stationary observational data by jointly leveraging temporal heteroskedasticity and multi-environment shifts. It extends identifiability analysis to heavy-tailed noise (multivariate Student's $t$) and derives finite-sample guarantees, showing their identifiability conditions mirror the Gaussian case but with a tail-dependent sample complexity penalty. A key theoretical result is that a temporal window must satisfy $T \ge \lceil d/r \rceil$ under rank-deficient heterogeneity to achieve identifiability with second-order information, and the information-theoretic lower bounds reveal an intrinsic penalty of $1 + \frac{3}{\nu-4}$ on sample efficiency. Empirical results on synthetic data corroborate the theory, illustrating eigenvalue crowding as a practical bottleneck and validating the heavy-tail penalty in covariance-based recovery. Overall, the work shifts focus from mere identifiability to practical recoverability, providing a principled framework for covariance-based causal discovery in non-stationary, heavy-tailed systems.

Abstract

Recovering a unique causal graph from observational data is an ill-posed problem because multiple generating mechanisms can lead to the same observational distribution. This problem becomes solvable only by exploiting specific structural or distributional assumptions. While recent work has separately utilized time-series dynamics or multi-environment heterogeneity to constrain this problem, we integrate both as complementary sources of heterogeneity. This integration yields unified necessary identifiability conditions and enables a rigorous analysis of the statistical limits of recovery under thin versus heavy-tailed noise. In particular, temporal structure is shown to effectively substitute for missing environmental diversity, possibly achieving identifiability even under insufficient heterogeneity. Extending this analysis to heavy-tailed (Student's t) distributions, we demonstrate that while geometric identifiability conditions remain invariant, the sample complexity diverges significantly from the Gaussian baseline. Explicit information-theoretic bounds quantify this cost of robustness, establishing the fundamental limits of covariance-based causal graph recovery methods in realistic non-stationary systems. This work shifts the focus from whether causal structure is identifiable to whether it is statistically recoverable in practice.

Sample Complexity of Causal Identification with Temporal Heterogeneity

TL;DR

The paper addresses learning causal structure from non-stationary observational data by jointly leveraging temporal heteroskedasticity and multi-environment shifts. It extends identifiability analysis to heavy-tailed noise (multivariate Student's ) and derives finite-sample guarantees, showing their identifiability conditions mirror the Gaussian case but with a tail-dependent sample complexity penalty. A key theoretical result is that a temporal window must satisfy under rank-deficient heterogeneity to achieve identifiability with second-order information, and the information-theoretic lower bounds reveal an intrinsic penalty of on sample efficiency. Empirical results on synthetic data corroborate the theory, illustrating eigenvalue crowding as a practical bottleneck and validating the heavy-tail penalty in covariance-based recovery. Overall, the work shifts focus from mere identifiability to practical recoverability, providing a principled framework for covariance-based causal discovery in non-stationary, heavy-tailed systems.

Abstract

Recovering a unique causal graph from observational data is an ill-posed problem because multiple generating mechanisms can lead to the same observational distribution. This problem becomes solvable only by exploiting specific structural or distributional assumptions. While recent work has separately utilized time-series dynamics or multi-environment heterogeneity to constrain this problem, we integrate both as complementary sources of heterogeneity. This integration yields unified necessary identifiability conditions and enables a rigorous analysis of the statistical limits of recovery under thin versus heavy-tailed noise. In particular, temporal structure is shown to effectively substitute for missing environmental diversity, possibly achieving identifiability even under insufficient heterogeneity. Extending this analysis to heavy-tailed (Student's t) distributions, we demonstrate that while geometric identifiability conditions remain invariant, the sample complexity diverges significantly from the Gaussian baseline. Explicit information-theoretic bounds quantify this cost of robustness, establishing the fundamental limits of covariance-based causal graph recovery methods in realistic non-stationary systems. This work shifts the focus from whether causal structure is identifiable to whether it is statistically recoverable in practice.
Paper Structure (50 sections, 8 theorems, 83 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 50 sections, 8 theorems, 83 equations, 4 figures, 2 tables, 1 algorithm.

Key Result

Lemma 4.1

Let $\mathbf{X} = f(\mathbf{S})$ be the Structural Causal Model where $f$ is a diffeomorphism. Let $p(\mathbf{x})$ and $p^{(i)}(\mathbf{x})$ denote the observational densities in the base environment and auxiliary environment $i$, respectively. Let $\mathbf{x}^*$ be a fixed reference point such that where $\mathbf{\Omega}^{(s,i)} = D_{\mathbf{s}}^2 \log p_{\mathbf{S}}(\mu_{\mathbf{S}}) - D_{\mathb

Figures (4)

  • Figure 1: Empirical validation of the heavy-tailed sample complexity penalty. (a) Near-singular regime ($\nu \to 4^+$, log scale): the empirical penalty tracks the theoretical divergence $\gamma(\nu) = 1 + \frac{3}{\nu - 4}$. (b) Wide-range behavior: tight agreement away from the singularity and smooth convergence to the Gaussian limit as $\nu \to \infty$.
  • Figure 2: Visualization of Eigenvalue Crowding.(a) The sorted eigenvalue magnitudes of the aggregated Hessian matrix $\Psi_{TE}$ across increasing system dimensions. For low $d$, eigenvalues are distinct steps; for high $d$, they are specifically crowded. (b) The mean spectral gap $\bar{\Delta}_\lambda = \mathbb{E}[\lambda_{i+1} - \lambda_i]$ decays rapidly as a function of $d$, making it practically hard for the algorithm to separate them.
  • Figure 3: Convergence rate under heavy-tailed noise. Log--log plots of covariance estimation error versus sample size. Across all degrees of freedom $\nu$, both Gaussian and Student-$t$ distributions exhibit an $O(1/N)$ convergence rate, while heavy tails induce a $\nu$-dependent constant shift that vanishes as $\nu$ increases.
  • Figure 4: Illustrative example of causal graph recovery on a synthetic Gaussian dataset with $d=5$. The left panel shows the ground-truth DAG, the middle panel shows the recovered edge weights, and the right panel shows the thresholded predicted graph. This example is provided for qualitative visualization only and is not used in the theoretical or empirical analysis.

Theorems & Definitions (23)

  • Definition 3.1
  • Definition 3.4
  • Definition 3.5
  • Definition 3.6
  • Definition 3.7
  • Definition 3.8
  • Definition 3.9
  • Lemma 4.1
  • Theorem 4.2
  • Proposition 4.3
  • ...and 13 more