Table of Contents
Fetching ...

Learning causal graphs using variable grouping according to ancestral relationship

Ming Cai, Hisayuki Hara

TL;DR

The paper tackles the challenge of learning causal DAGs when $n$ is small relative to $p$ by introducing CAG, a divide-and-conquer method that groups variables according to ancestral relationships under the LiNGAM assumption. CAG first identifies ancestral relations via regression-based CI testing to form groups, then applies DirectLiNGAM within each group and merges the results to recover the full DAG, with a cubic-time ancestral finding step and cycle-elimination mechanisms. The approach yields consistent sub-DAG estimation and often improves estimation accuracy and computation time over existing divide-and-conquer methods, especially in sparse, high-dimensional settings. The method can be viewed as a principled alternative or complement to CAPA and RCD, offering a favorable trade-off between accuracy and efficiency for small-sample causal discovery in practice.

Abstract

Several causal discovery algorithms have been proposed. However, when the sample size is small relative to the number of variables, the accuracy of estimating causal graphs using existing methods decreases. And some methods are not feasible when the sample size is smaller than the number of variables. To circumvent these problems, some researchers proposed causal structure learning algorithms using divide-and-conquer approaches. For learning the entire causal graph, the approaches first split variables into several subsets according to the conditional independence relationships among the variables, then apply a conventional causal discovery algorithm to each subset and merge the estimated results. Since the divide-and-conquer approach reduces the number of variables to which a causal structure learning algorithm is applied, it is expected to improve the estimation accuracy of causal graphs, especially when the sample size is small relative to the number of variables and the model is sparse. However, existing methods are either computationally expensive or do not provide sufficient accuracy when the sample size is small. This paper proposes a new algorithm for grouping variables based the ancestral relationships among the variables, under the LiNGAM assumption, where the causal relationships are linear, and the mutually independent noise are distributed as continuous non-Gaussian distributions. We call the proposed algorithm CAG. The time complexity of the ancestor finding in CAG is shown to be cubic to the number of variables. Extensive computer experiments confirm that the proposed method outperforms the original DirectLiNGAM without grouping variables and other divide-and-conquer approaches not only in estimation accuracy but also in computation time when the sample size is small relative to the number of variables and the model is sparse.

Learning causal graphs using variable grouping according to ancestral relationship

TL;DR

The paper tackles the challenge of learning causal DAGs when is small relative to by introducing CAG, a divide-and-conquer method that groups variables according to ancestral relationships under the LiNGAM assumption. CAG first identifies ancestral relations via regression-based CI testing to form groups, then applies DirectLiNGAM within each group and merges the results to recover the full DAG, with a cubic-time ancestral finding step and cycle-elimination mechanisms. The approach yields consistent sub-DAG estimation and often improves estimation accuracy and computation time over existing divide-and-conquer methods, especially in sparse, high-dimensional settings. The method can be viewed as a principled alternative or complement to CAPA and RCD, offering a favorable trade-off between accuracy and efficiency for small-sample causal discovery in practice.

Abstract

Several causal discovery algorithms have been proposed. However, when the sample size is small relative to the number of variables, the accuracy of estimating causal graphs using existing methods decreases. And some methods are not feasible when the sample size is smaller than the number of variables. To circumvent these problems, some researchers proposed causal structure learning algorithms using divide-and-conquer approaches. For learning the entire causal graph, the approaches first split variables into several subsets according to the conditional independence relationships among the variables, then apply a conventional causal discovery algorithm to each subset and merge the estimated results. Since the divide-and-conquer approach reduces the number of variables to which a causal structure learning algorithm is applied, it is expected to improve the estimation accuracy of causal graphs, especially when the sample size is small relative to the number of variables and the model is sparse. However, existing methods are either computationally expensive or do not provide sufficient accuracy when the sample size is small. This paper proposes a new algorithm for grouping variables based the ancestral relationships among the variables, under the LiNGAM assumption, where the causal relationships are linear, and the mutually independent noise are distributed as continuous non-Gaussian distributions. We call the proposed algorithm CAG. The time complexity of the ancestor finding in CAG is shown to be cubic to the number of variables. Extensive computer experiments confirm that the proposed method outperforms the original DirectLiNGAM without grouping variables and other divide-and-conquer approaches not only in estimation accuracy but also in computation time when the sample size is small relative to the number of variables and the model is sparse.
Paper Structure (17 sections, 3 theorems, 12 equations, 5 figures, 9 tables)

This paper contains 17 sections, 3 theorems, 12 equations, 5 figures, 9 tables.

Key Result

Proposition 1

One of the following four conditions holds for the ancestral relationship between $x_i$ and $x_j$.

Figures (5)

  • Figure 1: A causal DAG with nine variables.
  • Figure 2: An example of SADA not working.
  • Figure 3: $G_\sigma$ with $\sigma=1$ for the DAG in Fig.\ref{['Example_Fig']}
  • Figure 4: An example of the process of the CAG
  • Figure 5: $Anc_{rec}$ with Wald tests for some sample sizes

Theorems & Definitions (5)

  • Proposition 1: Maeda and Shimizu Maeda2020
  • Proposition 2: Maeda and Shimizu Maeda2020
  • Definition 1
  • Theorem 1
  • proof