CauScale: Neural Causal Discovery at Scale

Bo Peng; Sirui Chen; Jiaguo Tian; Yu Qiao; Chaochao Lu

CauScale: Neural Causal Discovery at Scale

Bo Peng, Sirui Chen, Jiaguo Tian, Yu Qiao, Chaochao Lu

TL;DR

CauScale tackles the scalability bottlenecks in causal discovery by presenting a neural, amortized approach with a two-stream architecture that couples data-driven relational evidence with graph priors. It introduces a reduction unit and tied attention to dramatically cut memory and compute, and a DataGraph block to preserve essential signals despite compression. Across synthetic and semi-synthetic gene networks, CauScale achieves near-perfect in-distribution accuracy ($mAP$ up to $99.6\%$) and strong generalization to larger graphs and OOD mechanisms, while delivering massive speedups (up to $13{,}000\times$) over prior methods. The work demonstrates practical scalability to graphs with up to $1000$ nodes and highlights its potential as a pre-training direction for efficient causal-discovery models at scale.

Abstract

Causal discovery is essential for advancing data-driven fields such as scientific AI and data analysis, yet existing approaches face significant time- and space-efficiency bottlenecks when scaling to large graphs. To address this challenge, we present CauScale, a neural architecture designed for efficient causal discovery that scales inference to graphs with up to 1000 nodes. CauScale improves time efficiency via a reduction unit that compresses data embeddings and improves space efficiency by adopting tied attention weights to avoid maintaining axis-specific attention maps. To keep high causal discovery accuracy, CauScale adopts a two-stream design: a data stream extracts relational evidence from high-dimensional observations, while a graph stream integrates statistical graph priors and preserves key structural signals. CauScale successfully scales to 500-node graphs during training, where prior work fails due to space limitations. Across testing data with varying graph scales and causal mechanisms, CauScale achieves 99.6% mAP on in-distribution data and 84.4% on out-of-distribution data, while delivering 4-13,000 times inference speedups over prior methods. Our project page is at https://github.com/OpenCausaLab/CauScale.

CauScale: Neural Causal Discovery at Scale

TL;DR

up to

) and strong generalization to larger graphs and OOD mechanisms, while delivering massive speedups (up to

) over prior methods. The work demonstrates practical scalability to graphs with up to

nodes and highlights its potential as a pre-training direction for efficient causal-discovery models at scale.

Abstract

Paper Structure (44 sections, 6 equations, 7 figures, 2 tables)

This paper contains 44 sections, 6 equations, 7 figures, 2 tables.

Introduction
Related Work
Preliminary
CauScale
Overall Architecture
DataGraph Block
Data2Graph layer.
Graph layer.
Reduction Unit
Tied Attention Weights
Prediction Head
Efficiency Analysis
Experiment
Settings
Baselines.
...and 29 more sections

Figures (7)

Figure 1: The architecture of CauScale. (a) The overall architecture and the changes of data embedding size during network processing. (b) The reduce operation in reduction unit. Between each $k$data-graph block s, the reduction unit pool the data embedding along the observation dimension to reduce it with a fraction of $r$.
Figure 2: Structure of the DaraGraph Block. The data-graph block process information on data and graph stream. On data stream, after being processed by the data axial attention layer, data embedding $h_b^D$ is sent to both the next module on data stream and summarized by the data2graph layer to graph message $\omega_b^{D\to G}$. The message will be concatenated with previous graph embedding $h_{b-1}^G$ and processed by graph layer in graph stream.
Figure 3: Comparison of w/ and w/o Reduction Unit.
Figure 4: Advantage of data-graph block over the block containing the data layer only.
Figure 5: Ablation on components: Ours vs. AVICI.
...and 2 more figures

CauScale: Neural Causal Discovery at Scale

TL;DR

Abstract

CauScale: Neural Causal Discovery at Scale

Authors

TL;DR

Abstract

Table of Contents

Figures (7)