Table of Contents
Fetching ...

Embracing the black box: Heading towards foundation models for causal discovery from time series data

Gideon Stein, Maha Shadaydeh, Joachim Denzler

TL;DR

This work introduces Causal Pretraining (CP), a supervised, end-to-end framework that learns a direct mapping from multivariate time series to causal graphs, yielding causally pretrained neural networks (CPNNs) usable in zero-shot inference. By training on synthetic data with known causal structures and augmenting with correlation-based regularization and feature injection, CP aims to scale to larger and more complex dynamics, with performance that improves alongside data and model size. Empirical results show CP can match or exceed traditional baselines under broader training distributions and demonstrates extrapolation capabilities on real-world signals, suggesting the potential for a foundation model for causal discovery. The approach emphasizes parallelizable inference, scalability, and practical applicability, while acknowledging open theoretical questions about identifiability and the limits of zero-shot generalization.

Abstract

Causal discovery from time series data encompasses many existing solutions, including those based on deep learning techniques. However, these methods typically do not endorse one of the most prevalent paradigms in deep learning: End-to-end learning. To address this gap, we explore what we call Causal Pretraining. A methodology that aims to learn a direct mapping from multivariate time series to the underlying causal graphs in a supervised manner. Our empirical findings suggest that causal discovery in a supervised manner is possible, assuming that the training and test time series samples share most of their dynamics. More importantly, we found evidence that the performance of Causal Pretraining can increase with data and model size, even if the additional data do not share the same dynamics. Further, we provide examples where causal discovery for real-world data with causally pretrained neural networks is possible within limits. We argue that this hints at the possibility of a foundation model for causal discovery.

Embracing the black box: Heading towards foundation models for causal discovery from time series data

TL;DR

This work introduces Causal Pretraining (CP), a supervised, end-to-end framework that learns a direct mapping from multivariate time series to causal graphs, yielding causally pretrained neural networks (CPNNs) usable in zero-shot inference. By training on synthetic data with known causal structures and augmenting with correlation-based regularization and feature injection, CP aims to scale to larger and more complex dynamics, with performance that improves alongside data and model size. Empirical results show CP can match or exceed traditional baselines under broader training distributions and demonstrates extrapolation capabilities on real-world signals, suggesting the potential for a foundation model for causal discovery. The approach emphasizes parallelizable inference, scalability, and practical applicability, while acknowledging open theoretical questions about identifiability and the limits of zero-shot generalization.

Abstract

Causal discovery from time series data encompasses many existing solutions, including those based on deep learning techniques. However, these methods typically do not endorse one of the most prevalent paradigms in deep learning: End-to-end learning. To address this gap, we explore what we call Causal Pretraining. A methodology that aims to learn a direct mapping from multivariate time series to the underlying causal graphs in a supervised manner. Our empirical findings suggest that causal discovery in a supervised manner is possible, assuming that the training and test time series samples share most of their dynamics. More importantly, we found evidence that the performance of Causal Pretraining can increase with data and model size, even if the additional data do not share the same dynamics. Further, we provide examples where causal discovery for real-world data with causally pretrained neural networks is possible within limits. We argue that this hints at the possibility of a foundation model for causal discovery.
Paper Structure (24 sections, 4 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 24 sections, 4 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Comparison between a general depiction of causal discovery methods (left) and our Causal Pretraining methodology (right). Instead of inferring causal graphs from data directly, Causal Pretraining produces neural networks that can be deployed for inference directly.
  • Figure 2: Depiction of architectures that we consider for Causal Pretraining. From left to right: GRU, Transformer, ConvMixer, MLP. We further mark locations for correlation injection with *.
  • Figure 3: Relationship between the Model size and the performance on datasets with an increased number of variables. We display the performance on Test-set 2 for each data point.
  • Figure 4: Comparison of inference speed. Here, we use the architecture with the highest AUROC score for each dataset to represent CPNNs. We report the speed of computing the solution for 500 samples and 100 repetitions.
  • Figure 5: Visualization of CR for $\alpha$ = 1.5 and $\beta$ = 0.15. The penalty is only big when the confidence is high while the correlation coefficient is low.
  • ...and 1 more figures