Ordering-Based Causal Discovery for Linear and Nonlinear Relations

Zhuopeng Xu; Yujie Li; Cheng Liu; Ning Gui

Ordering-Based Causal Discovery for Linear and Nonlinear Relations

Zhuopeng Xu, Yujie Li, Cheng Liu, Ning Gui

TL;DR

CaPS introduces a novel identification criterion for topological ordering and incorporates the concept of "parent score" during the post-processing optimization stage, helping to accelerate the pruning process and correct inaccurate predictions in the pruning step.

Abstract

Identifying causal relations from purely observational data typically requires additional assumptions on relations and/or noise. Most current methods restrict their analysis to datasets that are assumed to have pure linear or nonlinear relations, which is often not reflective of real-world datasets that contain a combination of both. This paper presents CaPS, an ordering-based causal discovery algorithm that effectively handles linear and nonlinear relations. CaPS introduces a novel identification criterion for topological ordering and incorporates the concept of "parent score" during the post-processing optimization stage. These scores quantify the strength of the average causal effect, helping to accelerate the pruning process and correct inaccurate predictions in the pruning step. Experimental results demonstrate that our proposed solutions outperform state-of-the-art baselines on synthetic data with varying ratios of linear and nonlinear relations. The results obtained from real-world data also support the competitiveness of CaPS. Code and datasets are available at https://github.com/E2real/CaPS.

Ordering-Based Causal Discovery for Linear and Nonlinear Relations

TL;DR

Abstract

Paper Structure (31 sections, 2 theorems, 24 equations, 11 figures, 3 tables, 2 algorithms)

This paper contains 31 sections, 2 theorems, 24 equations, 11 figures, 3 tables, 2 algorithms.

Introduction
Related Work
Preliminaries
Structural Equation Model
Topological Ordering
Causal Discovery with Parent Score
Leaf Nodes Discrimination
Parent Score
Pre-pruning and Edge Supplement
Computational Complexity
Experiments
Baselines and Settings
Synthetic Data
Real Data
Analysis Experiments
...and 16 more sections

Key Result

Theorem 1

Let $s(x) = \nabla \log p(x)$ be the score and let $\text{diag}(\cdot)$ be the diagonal elements of the matrix. For any $x_j$ in the causal graph $\mathcal{G}$:

Figures (11)

Figure 1: Performance of different solutions under datasets with different linear proportions. Since we don't know whether the real data is linear or nonlinear, it is difficult to choose an effective model. Thus, we need a method that works well in both linear and nonlinear and most possibly mixed cases.
Figure 2: Results of SynER1 and SynER4 with different linear proportions, where linear proportion equal to 0.0 means all relations are nonlinear and 1.0 means all relations are linear.
Figure 3: F1 score and training time of SynER1 with larger-scale causal graph.
Figure 4: Visualization on SynER1 dataset. Darker colors indicate stronger causal effects.
Figure 5: Example of the Syntren dataset.
...and 6 more figures

Theorems & Definitions (5)

Theorem 1
proof
Theorem 2
proof
proof

Ordering-Based Causal Discovery for Linear and Nonlinear Relations

TL;DR

Abstract

Ordering-Based Causal Discovery for Linear and Nonlinear Relations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (5)