Table of Contents
Fetching ...

Redefining the Shortest Path Problem Formulation of the Linear Non-Gaussian Acyclic Model: Pairwise Likelihood Ratios, Prior Knowledge, and Path Enumeration

Hans Jarett J. Ong, Brian Godwin S. Lim, Renzo Roel P. Tan, Kazushi Ikeda

TL;DR

This work addresses unmeasured confounding in LiNGAM by reframing causal ordering as a shortest-path problem (LiNGAM-SPP) and introducing three enhancements. First, it replaces kNN-based mutual information with Pairwise Likelihood Ratio (PLR), removing the need for parameter tuning and improving efficiency. Second, it enables incorporation of prior knowledge via relative ordering through a node-skipping mechanism, increasing adaptability. Third, it leverages the distribution of path measures across all causal orderings to infer properties of the true graph (e.g., presence of confounders, sparsity) and to predict the reliability of causal-discovery methods, supported by path-enumeration using ZDDs and multiple ML predictors. Collectively, these advances yield a more practical, scalable, and informative LiNGAM-SPP framework with demonstrated gains on simulated and real-world data.

Abstract

Effective causal discovery is essential for learning the causal graph from observational data. The linear non-Gaussian acyclic model (LiNGAM) operates under the assumption of a linear data generating process with non-Gaussian noise in determining the causal graph. Its assumption of unmeasured confounders being absent, however, poses practical limitations. In response, empirical research has shown that the reformulation of LiNGAM as a shortest path problem (LiNGAM-SPP) addresses this limitation. Within LiNGAM-SPP, mutual information is chosen to serve as the measure of independence. A challenge is introduced - parameter tuning is now needed due to its reliance on kNN mutual information estimators. The paper proposes a threefold enhancement to the LiNGAM-SPP framework. First, the need for parameter tuning is eliminated by using the pairwise likelihood ratio in lieu of kNN-based mutual information. This substitution is validated on a general data generating process and benchmark real-world data sets, outperforming existing methods especially when given a larger set of features. The incorporation of prior knowledge is then enabled by a node-skipping strategy implemented on the graph representation of all causal orderings to eliminate violations based on the provided input of relative orderings. Flexibility relative to existing approaches is achieved. Last among the three enhancements is the utilization of the distribution of paths in the graph representation of all causal orderings. From this, crucial properties of the true causal graph such as the presence of unmeasured confounders and sparsity may be inferred. To some extent, the expected performance of the causal discovery algorithm may be predicted. The refinements above advance the practicality and performance of LiNGAM-SPP, showcasing the potential of graph-search-based methodologies in advancing causal discovery.

Redefining the Shortest Path Problem Formulation of the Linear Non-Gaussian Acyclic Model: Pairwise Likelihood Ratios, Prior Knowledge, and Path Enumeration

TL;DR

This work addresses unmeasured confounding in LiNGAM by reframing causal ordering as a shortest-path problem (LiNGAM-SPP) and introducing three enhancements. First, it replaces kNN-based mutual information with Pairwise Likelihood Ratio (PLR), removing the need for parameter tuning and improving efficiency. Second, it enables incorporation of prior knowledge via relative ordering through a node-skipping mechanism, increasing adaptability. Third, it leverages the distribution of path measures across all causal orderings to infer properties of the true graph (e.g., presence of confounders, sparsity) and to predict the reliability of causal-discovery methods, supported by path-enumeration using ZDDs and multiple ML predictors. Collectively, these advances yield a more practical, scalable, and informative LiNGAM-SPP framework with demonstrated gains on simulated and real-world data.

Abstract

Effective causal discovery is essential for learning the causal graph from observational data. The linear non-Gaussian acyclic model (LiNGAM) operates under the assumption of a linear data generating process with non-Gaussian noise in determining the causal graph. Its assumption of unmeasured confounders being absent, however, poses practical limitations. In response, empirical research has shown that the reformulation of LiNGAM as a shortest path problem (LiNGAM-SPP) addresses this limitation. Within LiNGAM-SPP, mutual information is chosen to serve as the measure of independence. A challenge is introduced - parameter tuning is now needed due to its reliance on kNN mutual information estimators. The paper proposes a threefold enhancement to the LiNGAM-SPP framework. First, the need for parameter tuning is eliminated by using the pairwise likelihood ratio in lieu of kNN-based mutual information. This substitution is validated on a general data generating process and benchmark real-world data sets, outperforming existing methods especially when given a larger set of features. The incorporation of prior knowledge is then enabled by a node-skipping strategy implemented on the graph representation of all causal orderings to eliminate violations based on the provided input of relative orderings. Flexibility relative to existing approaches is achieved. Last among the three enhancements is the utilization of the distribution of paths in the graph representation of all causal orderings. From this, crucial properties of the true causal graph such as the presence of unmeasured confounders and sparsity may be inferred. To some extent, the expected performance of the causal discovery algorithm may be predicted. The refinements above advance the practicality and performance of LiNGAM-SPP, showcasing the potential of graph-search-based methodologies in advancing causal discovery.
Paper Structure (17 sections, 14 equations, 5 figures, 10 tables)

This paper contains 17 sections, 14 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: LiNGAM-SPP: A shortest-path formulation of the causal ordering problem. Adapted from the LiNGAM-SPP paper suzuki_lingam_mmi with slight changes in notation.
  • Figure 2: Performance Impact of Prior Knowledge: Combined results for $p = 4,8,12$.
  • Figure 3: Log-transformed Path Distributions for Various Scenarios
  • Figure 4: ROC Curve and Confusion Matrix of the Confounder Detector (CatBoost)
  • Figure 5: ROC Curve and Confusion Matrix of the Sparsity Estimator (CatBoost)