Large Causal Models for Temporal Causal Discovery

Nikolaos Kougioulis; Nikolaos Gkorgkolis; MingXue Wang; Bora Caglayan; Dario Simionato; Andrea Tonon; Ioannis Tsamardinos

Large Causal Models for Temporal Causal Discovery

Nikolaos Kougioulis, Nikolaos Gkorgkolis, MingXue Wang, Bora Caglayan, Dario Simionato, Andrea Tonon, Ioannis Tsamardinos

TL;DR

A principled framework for LCMs is proposed, combining diverse synthetic generators with realistic time-series datasets, allowing learning at scale and demonstrating LCMs as a promising foundation-model paradigm for temporal causal discovery.

Abstract

Causal discovery for both cross-sectional and temporal data has traditionally followed a dataset-specific paradigm, where a new model is fitted for each individual dataset. Such an approach limits the potential of multi-dataset pretraining. The concept of large causal models (LCMs) envisions a class of pre-trained neural architectures specifically designed for temporal causal discovery. Prior approaches are constrained to small variable counts, degrade with larger inputs, and rely heavily on synthetic data, limiting generalization. We propose a principled framework for LCMs, combining diverse synthetic generators with realistic time-series datasets, allowing learning at scale. Extensive experiments on synthetic, semi-synthetic and realistic benchmarks show that LCMs scale effectively to higher variable counts and deeper architectures while maintaining strong performance. Trained models achieve competitive or superior accuracy compared to classical and neural baselines, particularly in out-of-distribution settings, while enabling fast, single-pass inference. Results demonstrate LCMs as a promising foundation-model paradigm for temporal causal discovery. Experiments and model weights are available at https://github.com/kougioulis/LCM-paper/.

Large Causal Models for Temporal Causal Discovery

TL;DR

Abstract

Paper Structure (67 sections, 11 equations, 8 figures, 15 tables, 2 algorithms)

This paper contains 67 sections, 11 equations, 8 figures, 15 tables, 2 algorithms.

Introduction
Temporal Structural Causal Models (TSCMs).
Lagged Causal Graphs.
Adjacency Tensor Representation.
Related Work
Positioning of Our Work.
Problem Formulation
Model Hyperparameters.
Model Overview
Input Embeddings.
Encoder Stack.
Training Aids.
Feedforward Head.
Loss Function
Edge Prediction Loss.
...and 52 more sections

Figures (8)

Figure 1: Temporal causal dependencies represented as a (a) lagged causal graph and (b) binary adjacency tensor. Each slice $\mathbb{A}^{(\ell-1)}$ encodes edges at a discrete lag $\ell \leq \ell_{\max}$, where entry $\mathbb{A}^{(\ell-1)}_{j,i}=1$ denotes $V^i_{t-\ell} \to V^j_t$.
Figure 2: Overview of the large causal model (LCM) pipeline. (1) Synthetic and realistic TSCM generators produce training pairs of multivariate time series and their lagged causal graphs. (2) The LCM is trained via supervised learning on these pairs to discover a lagged adjacency tensor $\hat{\mathbb{A}}$ for a time series $\mathbf{X} \in \mathbb{R}^{L \times V}$, padded and normalized for stability. (3) At inference (CD phase), the pre-trained LCM predicts causal strengths on unseen datasets in a zero-shot manner.
Figure 3: A multivariate time series is embedded via Conv1D layers and positional encodings, processed through a Transformer encoder stack with optional distillation blocks, and augmented with lagged cross-correlations (training aids). A feedforward head outputs a lagged adjacency tensor representing the discovered temporal causal graph.
Figure 4: Running times (in seconds) for LCMs and baseline algorithms on the Synthetic_2 holdout set, averaged over 10 runs. Traditional methods (e.g., PCMCI & DYNOTEARS) scale superlinearly with lag and variable count, while Transformer-based LCMs remain effectively independent of input dimensionality due to their constant-time forward pass.
Figure 5: Empirical convergence of LCMs with increasing training data. Test AUC for 500K, 1M, and 2M parameter models trained on subsampled datasets. Validation/test sets are fixed to isolate the effect of data scale.
...and 3 more figures

Theorems & Definitions (7)

definition thmcounterdefinition: Causal Markov Condition, spirtes2001causationpearl2009causality
definition thmcounterdefinition: Faithfulness, spirtes2001causationpearl2009causality
definition thmcounterdefinition: Causal Sufficiency, spirtes2001causationpearl2009causality
definition thmcounterdefinition: Causal Stationarity, runge2018causal
definition thmcounterdefinition: Time-series Stationarity, brockwell1991time
definition thmcounterdefinition: Lagged Causal Graph - Window Causal Graph
definition thmcounterdefinition: Summary Causal Graph

Large Causal Models for Temporal Causal Discovery

TL;DR

Abstract

Large Causal Models for Temporal Causal Discovery

Authors

TL;DR

Abstract

Table of Contents

Figures (8)

Theorems & Definitions (7)