Inference of Causal Networks using a Topological Threshold

Filipe Barroso; Diogo Gomes; Gareth J. Baxter

Inference of Causal Networks using a Topological Threshold

Filipe Barroso, Diogo Gomes, Gareth J. Baxter

TL;DR

The paper addresses causal-network inference from data by removing ad-hoc thresholds through automatic topological criteria. It introduces Net Influence (NI), a state-wise, asymmetric measure, and a two-stage constraint-based algorithm that first determines a data-driven threshold via topological methods (Connected or Knee) and then prunes edges using conditioned CI tests. Across real and synthetic networks, NI-based Knee consistently achieves fast inference with strong edge-directionality, often surpassing the PC benchmark in both speed and accuracy, especially for larger networks. The work offers a scalable approach to discrete-data DAG discovery with principled thresholding and directionality, suitable for applications in engineering and science where large datasets and causal insight are essential.

Abstract

We propose a constraint-based algorithm, which automatically determines causal relevance thresholds, to infer causal networks from data. We call these topological thresholds. We present two methods for determining the threshold: the first seeks a set of edges that leaves no disconnected nodes in the network; the second seeks a causal large connected component in the data. We tested these methods both for discrete synthetic and real data, and compared the results with those obtained for the PC algorithm, which we took as the benchmark. We show that this novel algorithm is generally faster and more accurate than the PC algorithm. The algorithm for determining the thresholds requires choosing a measure of causality. We tested our methods for Fisher Correlations, commonly used in PC algorithm (for instance in \cite{kalisch2005}), and further proposed a discrete and asymmetric measure of causality, that we called Net Influence, which provided very good results when inferring causal networks from discrete data. This metric allows for inferring directionality of the edges in the process of applying the thresholds, speeding up the inference of causal DAGs.

Inference of Causal Networks using a Topological Threshold

TL;DR

Abstract

Paper Structure (14 sections, 4 equations, 12 figures, 2 tables, 1 algorithm)

This paper contains 14 sections, 4 equations, 12 figures, 2 tables, 1 algorithm.

Introduction
Inference of causal networks
Net Influence
Inference algorithm
First stage
Connected method
Knee method
Second stage
Computational complexity
Results for real-world networks
ASIA network
ALARM network
Results for synthetic networks
Discussion and Conclusions

Figures (12)

Figure 1: An illustration of the influence of state $j_1\in J$ on state $i_1 \in I$, as measured by Net Influence $W\left(i_1\mid j_1\right)$. Net Influence is the difference between the probability of state $i_1$ in the presence of $j_1$ from the probability of $i_1$ under any other state of $J$, and can be decomposed as a weighted mean of the differences of the $P\left(i_1\mid j_k\right)$.
Figure 2: Number of nodes in the largest component of the network, as a function of number of edges removed, for the ALARM network (analysed below). The Connected method finds the point before a node drops from the LCC (orange marker). The Knee method finds the point of greatest curvature (red marker).
Figure 3: ASIA network. Despite naming the network, the connection from node Asia, Asia$\rightarrow$Turberculosis, is not easy to predict.
Figure 4: Inferred skeleton of ALARM network using NIKnee. The labels were not displayed in order to avoid cluttering the image. Black lines represent edges correctly predicted, lines in green edges in excess (false positives) and lines in red missing edges (false negatives). Note that two nodes were left disconnected.
Figure 5: Variation of FPR score for inferred DAG with network size, for networks generated with mean degree close to $3$ and by method 2. Bars represent the range of results (and not uncertainty in the mean value). Each point corresponds to a mean of $30$ repetitions, except for nodes $120$ and $200$ which were computed with $22$ and $9$ repetitions, respectively.
...and 7 more figures

Inference of Causal Networks using a Topological Threshold

TL;DR

Abstract

Inference of Causal Networks using a Topological Threshold

Authors

TL;DR

Abstract

Table of Contents

Figures (12)