Table of Contents
Fetching ...

Theoretical Guarantees for Causal Discovery on Large Random Graphs

Mathieu Chevalley, Arash Mehrjou, Patrick Schwab

TL;DR

The paper tackles the problem of orientation errors in causal discovery under single-variable random interventions with latent confounding, formalized via $\epsilon$-interventional faithfulness. It develops finite-dimension deviation bounds for the false-negative rate (FNR) and establishes dimension-adaptive concentration results across three graph families: dense and sparse Erdős–Rényi DAGs and generalized Barabási–Albert (BA) graphs, with rates that decay as the graph grows. The main technical contributions are concentration results for the topological error $D_{top}$ and the normalized FNR $g$, derived using McDiarmid’s inequality and related bounds, and they reveal how network structure can regularize causal discovery. Empirical simulations corroborate the theory, showing concentration and often vanishing FNR as dimension increases, thus challenging the intuition that high dimensionality and heterogeneity hinder reliable causal orientation. The work provides principled guidance for designing interventional studies in large-scale systems and lays groundwork for extending guarantees to adaptive interventions and broader network models.

Abstract

We investigate theoretical guarantees for the false-negative rate (FNR) -- the fraction of true causal edges whose orientation is not recovered, under single-variable random interventions and an $ε$-interventional faithfulness assumption that accommodates latent confounding. For sparse Erdős--Rényi directed acyclic graphs, where the edge probability scales as $p_e = Θ(1/d)$, we show that the FNR concentrates around its mean at rate $O(\frac{\log d}{\sqrt d})$, implying that large deviations above the expected error become exponentially unlikely as dimensionality increases. This concentration ensures that derived upper bounds hold with high probability in large-scale settings. Extending the analysis to generalized Barabási--Albert graphs reveals an even stronger phenomenon: when the degree exponent satisfies $γ> 3$, the deviation width scales as $O(d^{β- \frac{1}{2}})$ with $β= 1/(γ- 1) < \frac{1}{2}$, and hence vanishes in the limit. This demonstrates that realistic scale-free topologies intrinsically regularize causal discovery, reducing variability in orientation error. These finite-dimension results provide the first dimension-adaptive, faithfulness-robust guarantees for causal structure recovery, and challenge the intuition that high dimensionality and network heterogeneity necessarily hinder accurate discovery. Our simulation results corroborate these theoretical predictions, showing that the FNR indeed concentrates and often vanishes in practice as dimensionality grows.

Theoretical Guarantees for Causal Discovery on Large Random Graphs

TL;DR

The paper tackles the problem of orientation errors in causal discovery under single-variable random interventions with latent confounding, formalized via -interventional faithfulness. It develops finite-dimension deviation bounds for the false-negative rate (FNR) and establishes dimension-adaptive concentration results across three graph families: dense and sparse Erdős–Rényi DAGs and generalized Barabási–Albert (BA) graphs, with rates that decay as the graph grows. The main technical contributions are concentration results for the topological error and the normalized FNR , derived using McDiarmid’s inequality and related bounds, and they reveal how network structure can regularize causal discovery. Empirical simulations corroborate the theory, showing concentration and often vanishing FNR as dimension increases, thus challenging the intuition that high dimensionality and heterogeneity hinder reliable causal orientation. The work provides principled guidance for designing interventional studies in large-scale systems and lays groundwork for extending guarantees to adaptive interventions and broader network models.

Abstract

We investigate theoretical guarantees for the false-negative rate (FNR) -- the fraction of true causal edges whose orientation is not recovered, under single-variable random interventions and an -interventional faithfulness assumption that accommodates latent confounding. For sparse Erdős--Rényi directed acyclic graphs, where the edge probability scales as , we show that the FNR concentrates around its mean at rate , implying that large deviations above the expected error become exponentially unlikely as dimensionality increases. This concentration ensures that derived upper bounds hold with high probability in large-scale settings. Extending the analysis to generalized Barabási--Albert graphs reveals an even stronger phenomenon: when the degree exponent satisfies , the deviation width scales as with , and hence vanishes in the limit. This demonstrates that realistic scale-free topologies intrinsically regularize causal discovery, reducing variability in orientation error. These finite-dimension results provide the first dimension-adaptive, faithfulness-robust guarantees for causal structure recovery, and challenge the intuition that high dimensionality and network heterogeneity necessarily hinder accurate discovery. Our simulation results corroborate these theoretical predictions, showing that the FNR indeed concentrates and often vanishes in practice as dimensionality grows.

Paper Structure

This paper contains 31 sections, 22 theorems, 93 equations, 7 figures, 1 table.

Key Result

Lemma 3

Define For each $k \in V$, let Then

Figures (7)

  • Figure 1: Interquartile range (IQR) of the FNR as a function of graph size $d$. For each graph family, results are shown across three density parameters and three values of intervention coverage $p_{\mathrm{int}}$. The IQR decreases with $d$, demonstrating vanishing variability as predicted by our theoretical results, except for scale-free BA graphs with $\kappa=1$, which correspond to the heavy-tailed regime with exponent $\gamma=\tfrac{7}{3}<3$.
  • Figure 2: Standard deviation of the FNR as a function of graph size $d$. For each graph family, results are shown across three density parameters and three values of intervention coverage $p_{\mathrm{int}}$. The deviation vanishes with growing $d$, in line with theory, except for scale-free BA graphs with $\kappa=1$, corresponding to a heavy-tailed regime with exponent $\gamma=\tfrac{7}{3}<3$.
  • Figure 3: Mean false negative rate (FNR) versus graph size $d$ across Erdős–Rényi (ER), scale-free ER, and Barabási–Albert (BA) graphs. The solid lines with points denote empirical averages; lines without points show theoretical upper bounds from \ref{['app:add_theorems']}. The bounds hold across all settings, with a slight mismatch at high intervention coverage ($p_{\mathrm{int}}=0.75$), likely due to optimization difficulties in DiffIntersort.
  • Figure 4: Mean unnormalized error $D_{\text{top}}$ versus graph size $d$ across Erdős–Rényi (ER), scale-free ER, and Barabási–Albert (BA) graphs.
  • Figure 5: IQR of unnormalized error $D_{\text{top}}$ versus graph size $d$ across Erdős–Rényi (ER), scale-free ER, and Barabási–Albert (BA) graphs.
  • ...and 2 more figures

Theorems & Definitions (37)

  • Lemma 3: Lipschitz bound for intervention variables
  • Lemma 4: Restricted case
  • Lemma 5: Edge variables
  • Remark 6
  • Theorem 7: Deviation bounds for topological errors
  • Theorem 8: Deviation bounds for f and g in ER graphs
  • Theorem 9: Deviation bounds in the sparse regime $p_e=c/d$
  • Lemma 10: High-Probability Bound on Node Degrees
  • Theorem 11: Deviation Bounds in Generalized BA Graphs
  • Theorem 12: Bounded differences inequality, mcdiarmid1989method
  • ...and 27 more