Table of Contents
Fetching ...

Sparse Additive Model Pruning for Order-Based Causal Structure Learning

Kentaro Kanamori, Hirofumi Suzuki, Takuya Takagi

TL;DR

A new pruning method based on sparse additive models is introduced, which enables direct pruning of redundant edges without relying on hypothesis testing and is significantly faster than existing pruning methods while maintaining comparable or superior accuracy.

Abstract

Causal structure learning, also known as causal discovery, aims to estimate causal relationships between variables as a form of a causal directed acyclic graph (DAG) from observational data. One of the major frameworks is the order-based approach that first estimates a topological order of the underlying DAG and then prunes spurious edges from the fully-connected DAG induced by the estimated topological order. Previous studies often focus on the former ordering step because it can dramatically reduce the search space of DAGs. In practice, the latter pruning step is equally crucial for ensuring both computational efficiency and estimation accuracy. Most existing methods employ a pruning technique based on generalized additive models and hypothesis testing, commonly known as CAM-pruning. However, this approach can be a computational bottleneck as it requires repeatedly fitting additive models for all variables. Furthermore, it may harm estimation quality due to multiple testing. To address these issues, we introduce a new pruning method based on sparse additive models, which enables direct pruning of redundant edges without relying on hypothesis testing. We propose an efficient algorithm for learning sparse additive models by combining the randomized tree embedding technique with group-wise sparse regression. Experimental results on both synthetic and real datasets demonstrated that our method is significantly faster than existing pruning methods while maintaining comparable or superior accuracy.

Sparse Additive Model Pruning for Order-Based Causal Structure Learning

TL;DR

A new pruning method based on sparse additive models is introduced, which enables direct pruning of redundant edges without relying on hypothesis testing and is significantly faster than existing pruning methods while maintaining comparable or superior accuracy.

Abstract

Causal structure learning, also known as causal discovery, aims to estimate causal relationships between variables as a form of a causal directed acyclic graph (DAG) from observational data. One of the major frameworks is the order-based approach that first estimates a topological order of the underlying DAG and then prunes spurious edges from the fully-connected DAG induced by the estimated topological order. Previous studies often focus on the former ordering step because it can dramatically reduce the search space of DAGs. In practice, the latter pruning step is equally crucial for ensuring both computational efficiency and estimation accuracy. Most existing methods employ a pruning technique based on generalized additive models and hypothesis testing, commonly known as CAM-pruning. However, this approach can be a computational bottleneck as it requires repeatedly fitting additive models for all variables. Furthermore, it may harm estimation quality due to multiple testing. To address these issues, we introduce a new pruning method based on sparse additive models, which enables direct pruning of redundant edges without relying on hypothesis testing. We propose an efficient algorithm for learning sparse additive models by combining the randomized tree embedding technique with group-wise sparse regression. Experimental results on both synthetic and real datasets demonstrated that our method is significantly faster than existing pruning methods while maintaining comparable or superior accuracy.
Paper Structure (31 sections, 1 theorem, 6 equations, 12 figures, 1 table, 1 algorithm)

This paper contains 31 sections, 1 theorem, 6 equations, 12 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

For a variable $X_j$, we assume $X_j \in [a_j, b_j]$ for some $a_j < b_j$. Then, for any continuous function $g^\ast \colon [a_j, b_j] \to \mathbb{R}$ and $\varepsilon > 0$, there exist our shape function $g_{i, j}$ such that $\max_{x_j \in [a_j, b_j]} |g^\ast(x_j) - g_{i, j}(x_j)| < \varepsilon$ ho

Figures (12)

  • Figure 1: An overview of the order-based causal structure learning algorithm. Given an observational dataset, it first estimates a topological order of the underlying causal DAG. Then, it prunes spurious edges from the fully-connected DAG induced by the estimated topological order. This paper focuses on the latter step and aims to propose an efficient and accurate pruning method.
  • Figure 2: An example of our SARTRE framework. We consider the same example as in \ref{['fig:intro:overview']}, where the estimated topological order $\hat{\pi} = (2, 3, 1, 4)$ is given and we consider to identify the parents of the variable $X_4$ from its candidate parents $\hat{\operatorname{pa}}(4) = \{ 2, 3, 1 \}$. (a) Our method first generates binary representation vectors $(\phi_2(X_2), \phi_3(X_3), \phi_1(X_1))$ by randomized tree embedding. Then, it learns an additive model $\hat{f}_4(X_{\hat{\operatorname{pa}}(4)}) = g_{4, 2}(X_2) + g_{4, 3}(X_3) + g_{4, 1}(X_1)$, where each shape function is defined by $g_{4, j}(X_j) = \bm{\beta}_{4, j}^\top \phi_j(X_j)$. By optimizing the coefficient vectors $(\bm{\beta}_{4, 2}, \bm{\beta}_{4, 3}, \bm{\beta}_{4, 1})$ through group lasso regression, we can obtain a sparse additive model $\hat{f}_4$. (b) For each shape function $g_{4, j}$, if $\bm{\beta}_{4, j} = \bm{0}$ holds, we have $g_{4, j}(X_j) = 0$ for any $X_j$, which enables us to prune the corresponding candidate parent $X_j$. In this example, $\bm{\beta}_{4, 3} = \bm{0}$ holds and thus we can prune $X_3$.
  • Figure 3: An example of the tree embedding technique. Given an ensemble of decision trees that takes $X_j$ as input, each leaf $k$ of a tree corresponds to an interval $r_{j, k}$ of $X_j$. Thus, we can obtain a set of intervals $R_j$ by collecting intervals corresponding to the leaves in a given tree ensemble.
  • Figure 4: Experimental results of baseline comparison on the synthetic datasets. For all the metrics, smaller values are better. The shaded areas indicate the standard deviations over $10$ trials. We varied the number of variables $d$ from $10$ to $50$. Our SARTRE was significantly faster than the baselines while maintaining comparable or superior SHD and SID.
  • Figure 5: Experimental results of baseline comparison by varying the number of samples $n$ from $1000$ to $5000$ on ER1 and ER4 datasets. Our SARTRE was faster than the baselines without significantly degrading SHD and SID.
  • ...and 7 more figures

Theorems & Definitions (3)

  • Proposition 1
  • proof : Proof (sketch)
  • proof : Proof of Propsition 1