A Knowledge-Informed Pretrained Model for Causal Discovery

Wenbo Xu; Yue He; Yunhai Wang; Xingxuan Zhang; Kun Kuang; Yueguo Chen; Peng Cui

A Knowledge-Informed Pretrained Model for Causal Discovery

Wenbo Xu, Yue He, Yunhai Wang, Xingxuan Zhang, Kun Kuang, Yueguo Chen, Peng Cui

Abstract

Causal discovery has been widely studied, yet many existing methods rely on strong assumptions or fall into two extremes: either depending on costly interventional signals or partial ground truth as strong priors, or adopting purely data driven paradigms with limited guidance, which hinders practical deployment. Motivated by real-world scenarios where only coarse domain knowledge is available, we propose a knowledge-informed pretrained model for causal discovery that integrates weak prior knowledge as a principled middle ground. Our model adopts a dual source encoder-decoder architecture to process observational data in a knowledge-informed way. We design a diverse pretraining dataset and a curriculum learning strategy that smoothly adapts the model to varying prior strengths across mechanisms, graph densities, and variable scales. Extensive experiments on in-distribution, out-of distribution, and real-world datasets demonstrate consistent improvements over existing baselines, with strong robustness and practical applicability.

A Knowledge-Informed Pretrained Model for Causal Discovery

Abstract

Paper Structure (30 sections, 6 equations, 6 figures, 4 tables)

This paper contains 30 sections, 6 equations, 6 figures, 4 tables.

Introduction
Related Work
Preliminary Study
Problem of Causal Discovery
Motivation
Knowledge Encoding
Model Architecture
Dual-Source Encoder
Dual-Source Alignment
Alternating Attention Mechanism
Cross Attention Mechanism
Graph Decoder
Permutation Matrices
Lower Triangle Decoder
Model Training
...and 15 more sections

Figures (6)

Figure 1: Simple causality in chemistry. (A) Expert knowledge is structured as graph. (B) The corresponding ground-truth causal graph.
Figure 2: An illustration of knowledge encoding. (A) A symbolic form of Fig. \ref{['fig:examples_industry']}. (B) The corresponding adjacency matrix $\mathbf{A}$. (C) The reachability matrix $\mathbf{R}$, which further encodes indirect relations (e.g., $R_{A,C}=1$). (D) The resulting prior knowledge $G_P$, where infeasible relations are excluded and self-loops are removed.
Figure 3: (A) The overall pipeline of Kode, illustrating the global framework that integrates raw data and prior knowledge to infer the causal graph; (B) The dual-source encoder, which achieves alignment and feature compression from hybrid inputs; (C) The graph decoder, responsible for mapping the latent representations back into the final causal structure.
Figure 4: Results on 20 nodes, covering i.d. linear cases and both i.d. & o.o.d nonlinear mechanisms.
Figure 5: Performance on Gaussian Process ER(3) across scales: Kode maintains a clear advantage; GES ($4^{th}$ bar) collapses owing to excessive density at $N=30$ and $40$, so its results are omitted.
...and 1 more figures

A Knowledge-Informed Pretrained Model for Causal Discovery

Abstract

A Knowledge-Informed Pretrained Model for Causal Discovery

Authors

Abstract

Table of Contents

Figures (6)