Table of Contents
Fetching ...

Score-matching-based Structure Learning for Temporal Data on Networks

Hao Chen, Kai Yi, Lin Liu, Yu Guang Wang

TL;DR

A new parent-finding subroutine for leaf nodes in DAGs is developed, significantly accelerating the most time-consuming part of the process: the pruning step, which results in an efficiency-lifted score matching algorithm, termed Parent Identification-based Causal structure learning for both i.i.d. and temporal data on networKs, or PICK.

Abstract

Causal discovery is a crucial initial step in establishing causality from empirical data and background knowledge. Numerous algorithms have been developed for this purpose. Among them, the score-matching method has demonstrated superior performance across various evaluation metrics, particularly for the commonly encountered Additive Nonlinear Causal Models. However, current score-matching-based algorithms are primarily designed to analyze independent and identically distributed (i.i.d.) data. More importantly, they suffer from high computational complexity due to the pruning step required for handling dense Directed Acyclic Graphs (DAGs). To enhance the scalability of score matching, we have developed a new parent-finding subroutine for leaf nodes in DAGs, significantly accelerating the most time-consuming part of the process: the pruning step. This improvement results in an efficiency-lifted score matching algorithm, termed Parent Identification-based Causal structure learning for both i.i.d. and temporal data on networKs, or PICK. The new score-matching algorithm extends the scope of existing algorithms and can handle static and temporal data on networks with weak network interference. Our proposed algorithm can efficiently cope with increasingly complex datasets that exhibit spatial and temporal dependencies, commonly encountered in academia and industry. The proposed algorithm can accelerate score-matching-based methods while maintaining high accuracy in real-world applications.

Score-matching-based Structure Learning for Temporal Data on Networks

TL;DR

A new parent-finding subroutine for leaf nodes in DAGs is developed, significantly accelerating the most time-consuming part of the process: the pruning step, which results in an efficiency-lifted score matching algorithm, termed Parent Identification-based Causal structure learning for both i.i.d. and temporal data on networKs, or PICK.

Abstract

Causal discovery is a crucial initial step in establishing causality from empirical data and background knowledge. Numerous algorithms have been developed for this purpose. Among them, the score-matching method has demonstrated superior performance across various evaluation metrics, particularly for the commonly encountered Additive Nonlinear Causal Models. However, current score-matching-based algorithms are primarily designed to analyze independent and identically distributed (i.i.d.) data. More importantly, they suffer from high computational complexity due to the pruning step required for handling dense Directed Acyclic Graphs (DAGs). To enhance the scalability of score matching, we have developed a new parent-finding subroutine for leaf nodes in DAGs, significantly accelerating the most time-consuming part of the process: the pruning step. This improvement results in an efficiency-lifted score matching algorithm, termed Parent Identification-based Causal structure learning for both i.i.d. and temporal data on networKs, or PICK. The new score-matching algorithm extends the scope of existing algorithms and can handle static and temporal data on networks with weak network interference. Our proposed algorithm can efficiently cope with increasingly complex datasets that exhibit spatial and temporal dependencies, commonly encountered in academia and industry. The proposed algorithm can accelerate score-matching-based methods while maintaining high accuracy in real-world applications.

Paper Structure

This paper contains 34 sections, 8 theorems, 58 equations, 14 figures, 2 tables, 4 algorithms.

Key Result

Lemma 1

Let $\Bar{X}^{(t)} \coloneqq (X^{(t)},\dots,X^{(t-p)})$. For any node $i$, we have: (i) Node $i$ is a leaf $\iff$$\forall{x}$ in the sample space of $X_{i}$, $\frac{\partial \mathsf{s}_i^{(t)}}{\partial x_i^{(t)}}(x)\equiv c$ for some constant $c$ that is independent of $x$, or equivalently $\mathrm

Figures (14)

  • Figure 1: The left provides a brief overview of the main framework of the PICK algorithm. The topological ordering and parent identification procedures are executed within a loop, with each iteration identifying one leaf node and its corresponding parent nodes. The right illustrates the intuition behind our parent identification subroutine. Notably, in the causal DAG (upper panel), although node $d_1$ is the parent (cause) of nodes $d_2$, $d_3$, and $d_4$, the variance of its score function is influenced by its three child nodes.
  • Figure 2: SHD results of PICK-t and baselines for predicted intra-snapshot and inter-snapshot causal graph with link function $f_{i}^{(t)}(x_i)=\sum\limits_{j\in\mathsf{pa}(i)}\sin{x_j}$.
  • Figure 3: SHD for predicted and ground truth causal graph with link function $f_{i}^{(t)}$ generated by sampling Gaussian process with a unit bandwidth RBF kernel. The upper and lower rows show the results for low dimension and high dimension respectively.
  • Figure 4: Running time for predicted and ground truth causal graph with link function $f_{i}^{(t)}$ generated by sampling Gaussian process with a unit bandwidth RBF kernel. The upper and lower rows show the results for low dimension and high dimension respectively.
  • Figure 5: FDR for predicted inter-snapshot causal graph and ground truth inter-snapshot causal graph with link function $f_{i}^{(t)}(x_i)=\sum\limits_{j\in\mathsf{pa}(i)}\sin{x_j}$.
  • ...and 9 more figures

Theorems & Definitions (14)

  • Lemma 1
  • Theorem 1
  • Corollary 1
  • Proposition 1
  • Theorem 2
  • Corollary 2
  • proof
  • proof
  • proof
  • Lemma 2
  • ...and 4 more