Table of Contents
Fetching ...

Learning Identifiable Structures Helps Avoid Bias in DNN-based Supervised Causal Learning

Jiaru Zhang, Rui Ding, Qiang Fu, Bojun Huang, Zizhen Deng, Yang Hua, Haibing Guan, Shi Han, Dongmei Zhang

TL;DR

The paper addresses systematic bias in Deep Neural Network–based supervised causal learning (SCL) arising from predicting directed edges independently (the Node-Edge paradigm). It proposes SiCL, which predicts MEC-identifiable structures—a skeleton and a set of v-structures—via a pairwise-encoded architecture with unidirectional attention, enabling consistent estimators under the canonical MEC setting. The authors prove the identifiability advantage of skeleton and v-structures over edges, and demonstrate through extensive experiments on synthetic data and the Sachs benchmark that SiCL substantially outperforms state-of-the-art DNN-based SCL methods, including notable gains in SHD on real data. The work suggests a shift toward learning identifiable causal structures to mitigate fundamental bias and achieve robust causal discovery in practice, while acknowledging computational considerations and avenues for future improvements.

Abstract

Causal discovery is a structured prediction task that aims to predict causal relations among variables based on their data samples. Supervised Causal Learning (SCL) is an emerging paradigm in this field. Existing Deep Neural Network (DNN)-based methods commonly adopt the "Node-Edge approach", in which the model first computes an embedding vector for each variable-node, then uses these variable-wise representations to concurrently and independently predict for each directed causal-edge. In this paper, we first show that this architecture has some systematic bias that cannot be mitigated regardless of model size and data size. We then propose SiCL, a DNN-based SCL method that predicts a skeleton matrix together with a v-tensor (a third-order tensor representing the v-structures). According to the Markov Equivalence Class (MEC) theory, both the skeleton and the v-structures are identifiable causal structures under the canonical MEC setting, so predictions about skeleton and v-structures do not suffer from the identifiability limit in causal discovery, thus SiCL can avoid the systematic bias in Node-Edge architecture, and enable consistent estimators for causal discovery. Moreover, SiCL is also equipped with a specially designed pairwise encoder module with a unidirectional attention layer to model both internal and external relationships of pairs of nodes. Experimental results on both synthetic and real-world benchmarks show that SiCL significantly outperforms other DNN-based SCL approaches.

Learning Identifiable Structures Helps Avoid Bias in DNN-based Supervised Causal Learning

TL;DR

The paper addresses systematic bias in Deep Neural Network–based supervised causal learning (SCL) arising from predicting directed edges independently (the Node-Edge paradigm). It proposes SiCL, which predicts MEC-identifiable structures—a skeleton and a set of v-structures—via a pairwise-encoded architecture with unidirectional attention, enabling consistent estimators under the canonical MEC setting. The authors prove the identifiability advantage of skeleton and v-structures over edges, and demonstrate through extensive experiments on synthetic data and the Sachs benchmark that SiCL substantially outperforms state-of-the-art DNN-based SCL methods, including notable gains in SHD on real data. The work suggests a shift toward learning identifiable causal structures to mitigate fundamental bias and achieve robust causal discovery in practice, while acknowledging computational considerations and avenues for future improvements.

Abstract

Causal discovery is a structured prediction task that aims to predict causal relations among variables based on their data samples. Supervised Causal Learning (SCL) is an emerging paradigm in this field. Existing Deep Neural Network (DNN)-based methods commonly adopt the "Node-Edge approach", in which the model first computes an embedding vector for each variable-node, then uses these variable-wise representations to concurrently and independently predict for each directed causal-edge. In this paper, we first show that this architecture has some systematic bias that cannot be mitigated regardless of model size and data size. We then propose SiCL, a DNN-based SCL method that predicts a skeleton matrix together with a v-tensor (a third-order tensor representing the v-structures). According to the Markov Equivalence Class (MEC) theory, both the skeleton and the v-structures are identifiable causal structures under the canonical MEC setting, so predictions about skeleton and v-structures do not suffer from the identifiability limit in causal discovery, thus SiCL can avoid the systematic bias in Node-Edge architecture, and enable consistent estimators for causal discovery. Moreover, SiCL is also equipped with a specially designed pairwise encoder module with a unidirectional attention layer to model both internal and external relationships of pairs of nodes. Experimental results on both synthetic and real-world benchmarks show that SiCL significantly outperforms other DNN-based SCL approaches.

Paper Structure

This paper contains 37 sections, 6 theorems, 18 equations, 10 figures, 11 tables, 2 algorithms.

Key Result

Proposition 3.1

Let $\mathcal{G}_n$ be the set of graphs with $n+1$ nodes where there is a central node $y$ such that (1) every other node is connected to $y$, (2) there is no edge between the other nodes, and (3) there is at most one edge pointing to $y$. We have

Figures (10)

  • Figure 1: The inference workflow of SiCL.
  • Figure 2: Comparison of SiCL-Node-Edge and SiCL-no-PF in o-F1 trend as observation samples increase on a constructed dataset.
  • Figure A3: Illustration of the pairwise encoder module. In Part ①, it initializes raw pairwise features. In Part ②, a unidirectional attention is applied to utilized information from node features and pairwise features. In Part ③, an MLP and residual connection is used to yield final pairwise features.
  • Figure A4: The problem setting to emphasize the limitations of the Node-Edge approach. Best viewed in color.
  • Figure A5: Illustration of the architecture comparison of Node-Edge models, SiCL-no-PF and SiCL.
  • ...and 5 more figures

Theorems & Definitions (18)

  • Proposition 3.1
  • Definition A1.1: Global Markov Property (GMP)
  • Definition A1.2: Canonical Assumption
  • Definition A1.3: Skeleton
  • Definition A1.4: Unshielded Triples (UTs) and V-structures
  • Definition A1.5: Separation Set
  • Definition A1.6: Skeleton Predictor
  • Proposition A1.1: Existence of a Perfect Skeleton Predictor
  • proof
  • Theorem A1.1
  • ...and 8 more