Table of Contents
Fetching ...

Causal-Audit: A Framework for Risk Assessment of Assumption Violations in Time-Series Causal Discovery

Marco Ruiz, Miguel Arana-Catania, David R. Ardila, Rodrigo Ventura

Abstract

Time-series causal discovery methods rely on assumptions such as stationarity, regular sampling, and bounded temporal dependence. When these assumptions are violated, structure learning can produce confident but misleading causal graphs without warning. We introduce Causal-Audit, a framework that formalizes assumption validation as calibrated risk assessment. The framework computes effect-size diagnostics across five assumption families (stationarity, irregularity, persistence, nonlinearity, and confounding proxies), aggregates them into four calibrated risk scores with uncertainty intervals, and applies an abstention-aware decision policy that recommends methods (e.g., PCMCI+, VAR-based Granger causality) only when evidence supports reliable inference. The semi-automatic diagnostic stage can also be used independently for structured assumption auditing in individual studies. Evaluation on a synthetic atlas of 500 data-generating processes (DGPs) spanning 10 violation families demonstrates well-calibrated risk scores (AUROC > 0.95), a 62% false positive reduction among recommended datasets, and 78% abstention on severe-violation cases. On 21 external evaluations from TimeGraph (18 categories) and CausalTime (3 domains), recommend-or-abstain decisions are consistent with benchmark specifications in all cases. An open-source implementation of our framework is available.

Causal-Audit: A Framework for Risk Assessment of Assumption Violations in Time-Series Causal Discovery

Abstract

Time-series causal discovery methods rely on assumptions such as stationarity, regular sampling, and bounded temporal dependence. When these assumptions are violated, structure learning can produce confident but misleading causal graphs without warning. We introduce Causal-Audit, a framework that formalizes assumption validation as calibrated risk assessment. The framework computes effect-size diagnostics across five assumption families (stationarity, irregularity, persistence, nonlinearity, and confounding proxies), aggregates them into four calibrated risk scores with uncertainty intervals, and applies an abstention-aware decision policy that recommends methods (e.g., PCMCI+, VAR-based Granger causality) only when evidence supports reliable inference. The semi-automatic diagnostic stage can also be used independently for structured assumption auditing in individual studies. Evaluation on a synthetic atlas of 500 data-generating processes (DGPs) spanning 10 violation families demonstrates well-calibrated risk scores (AUROC > 0.95), a 62% false positive reduction among recommended datasets, and 78% abstention on severe-violation cases. On 21 external evaluations from TimeGraph (18 categories) and CausalTime (3 domains), recommend-or-abstain decisions are consistent with benchmark specifications in all cases. An open-source implementation of our framework is available.

Paper Structure

This paper contains 27 sections, 8 equations, 10 figures, 15 tables.

Figures (10)

  • Figure 1: Framework overview. Tier 1 (Stage I alone) provides automatic diagnostics d across five assumption families for expert-guided assumption auditing. Tier 2 (Stages I--III) adds calibrated risk estimation with uncertainty intervals and an abstention-aware decision policy that recommends using or abstaining from a method $m^*$ according to its risk score R.
  • Figure 2: Assumption violations in causal discovery. Each row shows data violating an assumption (left) and the resulting causal graph with true and erroneous edges (right; see legend).
  • Figure 3: Time series causal graph $\mathcal{G} = (V, E)$ for $N = 3$ variables with $\tau_{\max} = 2$. (a) Timeline representation: arrows between variable timelines encode causal effects; horizontal span equals the lag $\tau$. Each edge repeats at every time step (stationarity). (b) Summary causal graph: each directed edge is annotated with its lag, corresponding to the triple $(i, j, \tau) \in E$.
  • Figure 4: Detailed flowcharts for each pipeline stage. (a) Stage I: Diagnostic Auditing computes five diagnostic families from input $\mathbf{X}$, producing the diagnostic vector $\mathbf{d} = [d_1, \ldots, d_5]$. (b) Stage II: Risk Estimation transforms diagnostics into calibrated risk scores via logistic aggregation, isotonic calibration, and bootstrap uncertainty quantification. (c) Stage III: Decision Policy evaluates thresholds to output Recommend$m^*$ or Abstain.
  • Figure 5: Sigmoid mapping from linear predictor $z$ to risk probability $R_k \in [0,1]$ for VAR-Granger and PCMCI+ methods. Shaded regions illustrate decision zones for nonstationarity risk ($R_{\mathrm{nonstat}}$) using the hard constraints from Table \ref{['tab:method-constraints']}: both Granger and PCMCI+ are admissible when $R_k < \theta^{\mathrm{Granger}}_k = 0.60$, only PCMCI+ is admissible when $0.60 \leq R_k < \theta^{\mathrm{PCMCI+}}_k = 0.80$, and abstention is required when $R_k \geq 0.80$.
  • ...and 5 more figures