Table of Contents
Fetching ...

Large Causal Models for Temporal Causal Discovery

Nikolaos Kougioulis, Nikolaos Gkorgkolis, MingXue Wang, Bora Caglayan, Dario Simionato, Andrea Tonon, Ioannis Tsamardinos

TL;DR

A principled framework for LCMs is proposed, combining diverse synthetic generators with realistic time-series datasets, allowing learning at scale and demonstrating LCMs as a promising foundation-model paradigm for temporal causal discovery.

Abstract

Causal discovery for both cross-sectional and temporal data has traditionally followed a dataset-specific paradigm, where a new model is fitted for each individual dataset. Such an approach limits the potential of multi-dataset pretraining. The concept of large causal models (LCMs) envisions a class of pre-trained neural architectures specifically designed for temporal causal discovery. Prior approaches are constrained to small variable counts, degrade with larger inputs, and rely heavily on synthetic data, limiting generalization. We propose a principled framework for LCMs, combining diverse synthetic generators with realistic time-series datasets, allowing learning at scale. Extensive experiments on synthetic, semi-synthetic and realistic benchmarks show that LCMs scale effectively to higher variable counts and deeper architectures while maintaining strong performance. Trained models achieve competitive or superior accuracy compared to classical and neural baselines, particularly in out-of-distribution settings, while enabling fast, single-pass inference. Results demonstrate LCMs as a promising foundation-model paradigm for temporal causal discovery. Experiments and model weights are available at https://github.com/kougioulis/LCM-paper/.

Large Causal Models for Temporal Causal Discovery

TL;DR

A principled framework for LCMs is proposed, combining diverse synthetic generators with realistic time-series datasets, allowing learning at scale and demonstrating LCMs as a promising foundation-model paradigm for temporal causal discovery.

Abstract

Causal discovery for both cross-sectional and temporal data has traditionally followed a dataset-specific paradigm, where a new model is fitted for each individual dataset. Such an approach limits the potential of multi-dataset pretraining. The concept of large causal models (LCMs) envisions a class of pre-trained neural architectures specifically designed for temporal causal discovery. Prior approaches are constrained to small variable counts, degrade with larger inputs, and rely heavily on synthetic data, limiting generalization. We propose a principled framework for LCMs, combining diverse synthetic generators with realistic time-series datasets, allowing learning at scale. Extensive experiments on synthetic, semi-synthetic and realistic benchmarks show that LCMs scale effectively to higher variable counts and deeper architectures while maintaining strong performance. Trained models achieve competitive or superior accuracy compared to classical and neural baselines, particularly in out-of-distribution settings, while enabling fast, single-pass inference. Results demonstrate LCMs as a promising foundation-model paradigm for temporal causal discovery. Experiments and model weights are available at https://github.com/kougioulis/LCM-paper/.
Paper Structure (67 sections, 11 equations, 8 figures, 15 tables, 2 algorithms)

This paper contains 67 sections, 11 equations, 8 figures, 15 tables, 2 algorithms.

Figures (8)

  • Figure 1: Temporal causal dependencies represented as a (a) lagged causal graph and (b) binary adjacency tensor. Each slice $\mathbb{A}^{(\ell-1)}$ encodes edges at a discrete lag $\ell \leq \ell_{\max}$, where entry $\mathbb{A}^{(\ell-1)}_{j,i}=1$ denotes $V^i_{t-\ell} \to V^j_t$.
  • Figure 2: Overview of the large causal model (LCM) pipeline. (1) Synthetic and realistic TSCM generators produce training pairs of multivariate time series and their lagged causal graphs. (2) The LCM is trained via supervised learning on these pairs to discover a lagged adjacency tensor $\hat{\mathbb{A}}$ for a time series $\mathbf{X} \in \mathbb{R}^{L \times V}$, padded and normalized for stability. (3) At inference (CD phase), the pre-trained LCM predicts causal strengths on unseen datasets in a zero-shot manner.
  • Figure 3: A multivariate time series is embedded via Conv1D layers and positional encodings, processed through a Transformer encoder stack with optional distillation blocks, and augmented with lagged cross-correlations (training aids). A feedforward head outputs a lagged adjacency tensor representing the discovered temporal causal graph.
  • Figure 4: Running times (in seconds) for LCMs and baseline algorithms on the Synthetic_2 holdout set, averaged over 10 runs. Traditional methods (e.g., PCMCI & DYNOTEARS) scale superlinearly with lag and variable count, while Transformer-based LCMs remain effectively independent of input dimensionality due to their constant-time forward pass.
  • Figure 5: Empirical convergence of LCMs with increasing training data. Test AUC for 500K, 1M, and 2M parameter models trained on subsampled datasets. Validation/test sets are fixed to isolate the effect of data scale.
  • ...and 3 more figures

Theorems & Definitions (7)

  • definition thmcounterdefinition: Causal Markov Condition, spirtes2001causationpearl2009causality
  • definition thmcounterdefinition: Faithfulness, spirtes2001causationpearl2009causality
  • definition thmcounterdefinition: Causal Sufficiency, spirtes2001causationpearl2009causality
  • definition thmcounterdefinition: Causal Stationarity, runge2018causal
  • definition thmcounterdefinition: Time-series Stationarity, brockwell1991time
  • definition thmcounterdefinition: Lagged Causal Graph - Window Causal Graph
  • definition thmcounterdefinition: Summary Causal Graph