Table of Contents
Fetching ...

CauScale: Neural Causal Discovery at Scale

Bo Peng, Sirui Chen, Jiaguo Tian, Yu Qiao, Chaochao Lu

TL;DR

CauScale tackles the scalability bottlenecks in causal discovery by presenting a neural, amortized approach with a two-stream architecture that couples data-driven relational evidence with graph priors. It introduces a reduction unit and tied attention to dramatically cut memory and compute, and a DataGraph block to preserve essential signals despite compression. Across synthetic and semi-synthetic gene networks, CauScale achieves near-perfect in-distribution accuracy ($mAP$ up to $99.6\%$) and strong generalization to larger graphs and OOD mechanisms, while delivering massive speedups (up to $13{,}000\times$) over prior methods. The work demonstrates practical scalability to graphs with up to $1000$ nodes and highlights its potential as a pre-training direction for efficient causal-discovery models at scale.

Abstract

Causal discovery is essential for advancing data-driven fields such as scientific AI and data analysis, yet existing approaches face significant time- and space-efficiency bottlenecks when scaling to large graphs. To address this challenge, we present CauScale, a neural architecture designed for efficient causal discovery that scales inference to graphs with up to 1000 nodes. CauScale improves time efficiency via a reduction unit that compresses data embeddings and improves space efficiency by adopting tied attention weights to avoid maintaining axis-specific attention maps. To keep high causal discovery accuracy, CauScale adopts a two-stream design: a data stream extracts relational evidence from high-dimensional observations, while a graph stream integrates statistical graph priors and preserves key structural signals. CauScale successfully scales to 500-node graphs during training, where prior work fails due to space limitations. Across testing data with varying graph scales and causal mechanisms, CauScale achieves 99.6% mAP on in-distribution data and 84.4% on out-of-distribution data, while delivering 4-13,000 times inference speedups over prior methods. Our project page is at https://github.com/OpenCausaLab/CauScale.

CauScale: Neural Causal Discovery at Scale

TL;DR

CauScale tackles the scalability bottlenecks in causal discovery by presenting a neural, amortized approach with a two-stream architecture that couples data-driven relational evidence with graph priors. It introduces a reduction unit and tied attention to dramatically cut memory and compute, and a DataGraph block to preserve essential signals despite compression. Across synthetic and semi-synthetic gene networks, CauScale achieves near-perfect in-distribution accuracy ( up to ) and strong generalization to larger graphs and OOD mechanisms, while delivering massive speedups (up to ) over prior methods. The work demonstrates practical scalability to graphs with up to nodes and highlights its potential as a pre-training direction for efficient causal-discovery models at scale.

Abstract

Causal discovery is essential for advancing data-driven fields such as scientific AI and data analysis, yet existing approaches face significant time- and space-efficiency bottlenecks when scaling to large graphs. To address this challenge, we present CauScale, a neural architecture designed for efficient causal discovery that scales inference to graphs with up to 1000 nodes. CauScale improves time efficiency via a reduction unit that compresses data embeddings and improves space efficiency by adopting tied attention weights to avoid maintaining axis-specific attention maps. To keep high causal discovery accuracy, CauScale adopts a two-stream design: a data stream extracts relational evidence from high-dimensional observations, while a graph stream integrates statistical graph priors and preserves key structural signals. CauScale successfully scales to 500-node graphs during training, where prior work fails due to space limitations. Across testing data with varying graph scales and causal mechanisms, CauScale achieves 99.6% mAP on in-distribution data and 84.4% on out-of-distribution data, while delivering 4-13,000 times inference speedups over prior methods. Our project page is at https://github.com/OpenCausaLab/CauScale.
Paper Structure (44 sections, 6 equations, 7 figures, 2 tables)

This paper contains 44 sections, 6 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: The architecture of CauScale. (a) The overall architecture and the changes of data embedding size during network processing. (b) The reduce operation in reduction unit. Between each $k$data-graph block s, the reduction unit pool the data embedding along the observation dimension to reduce it with a fraction of $r$.
  • Figure 2: Structure of the DaraGraph Block. The data-graph block process information on data and graph stream. On data stream, after being processed by the data axial attention layer, data embedding $h_b^D$ is sent to both the next module on data stream and summarized by the data2graph layer to graph message $\omega_b^{D\to G}$. The message will be concatenated with previous graph embedding $h_{b-1}^G$ and processed by graph layer in graph stream.
  • Figure 3: Comparison of w/ and w/o Reduction Unit.
  • Figure 4: Advantage of data-graph block over the block containing the data layer only.
  • Figure 5: Ablation on components: Ours vs. AVICI.
  • ...and 2 more figures