Table of Contents
Fetching ...

Learning Linear Polytree Structural Equation Models

Xingmei Lou, Yu Hu, Xiaodong Li

TL;DR

This work addresses learning the skeleton and CPDAG of Gaussian linear polytree SEMs from i.i.d. data, and provides sharp sample-size characterizations. The approach combines Chow-Liu skeleton recovery on pairwise correlations with threshold-based v-structure detection and Meek-style orientation, complemented by a PC-adapted variant and inverse-correlation-matrix estimation; it further extends to group polytree models. The main contributions include sufficiency results $n > O\left(\frac{\log p}{\rho_{\min}^2}\right)$ for skeleton and $n > O\left(\frac{\log p}{\rho_{\min}^4}\right)$ for CPDAG, accompanied by information-theoretic lower bounds that establish sharpness, plus $\ell_1$ error bounds for inverse-correlation estimation and a group-polytree extension. Empirical results on simulated and benchmark data demonstrate the method's accuracy and scalability, with robustness to approximate polytree structures and practical performance advantages over traditional DAG-learning methods.

Abstract

We are interested in the problem of learning the directed acyclic graph (DAG) when data are generated from a linear structural equation model (SEM) and the causal structure can be characterized by a polytree. Under the Gaussian polytree models, we study sufficient conditions on the sample sizes for the well-known Chow-Liu algorithm to exactly recover both the skeleton and the equivalence class of the polytree, which is uniquely represented by a CPDAG. On the other hand, necessary conditions on the required sample sizes for both skeleton and CPDAG recovery are also derived in terms of information-theoretic lower bounds, which match the respective sufficient conditions and thereby give a sharp characterization of the difficulty of these tasks. We also consider the problem of inverse correlation matrix estimation under the linear polytree models, and establish the estimation error bound in terms of the dimension and the total number of v-structures. We also consider an extension of group linear polytree models, in which each node represents a group of variables. Our theoretical findings are illustrated by comprehensive numerical simulations, and experiments on benchmark data also demonstrate the robustness of polytree learning when the true graphical structures can only be approximated by polytrees.

Learning Linear Polytree Structural Equation Models

TL;DR

This work addresses learning the skeleton and CPDAG of Gaussian linear polytree SEMs from i.i.d. data, and provides sharp sample-size characterizations. The approach combines Chow-Liu skeleton recovery on pairwise correlations with threshold-based v-structure detection and Meek-style orientation, complemented by a PC-adapted variant and inverse-correlation-matrix estimation; it further extends to group polytree models. The main contributions include sufficiency results for skeleton and for CPDAG, accompanied by information-theoretic lower bounds that establish sharpness, plus error bounds for inverse-correlation estimation and a group-polytree extension. Empirical results on simulated and benchmark data demonstrate the method's accuracy and scalability, with robustness to approximate polytree structures and practical performance advantages over traditional DAG-learning methods.

Abstract

We are interested in the problem of learning the directed acyclic graph (DAG) when data are generated from a linear structural equation model (SEM) and the causal structure can be characterized by a polytree. Under the Gaussian polytree models, we study sufficient conditions on the sample sizes for the well-known Chow-Liu algorithm to exactly recover both the skeleton and the equivalence class of the polytree, which is uniquely represented by a CPDAG. On the other hand, necessary conditions on the required sample sizes for both skeleton and CPDAG recovery are also derived in terms of information-theoretic lower bounds, which match the respective sufficient conditions and thereby give a sharp characterization of the difficulty of these tasks. We also consider the problem of inverse correlation matrix estimation under the linear polytree models, and establish the estimation error bound in terms of the dimension and the total number of v-structures. We also consider an extension of group linear polytree models, in which each node represents a group of variables. Our theoretical findings are illustrated by comprehensive numerical simulations, and experiments on benchmark data also demonstrate the robustness of polytree learning when the true graphical structures can only be approximated by polytrees.

Paper Structure

This paper contains 22 sections, 18 theorems, 80 equations, 5 figures, 4 tables, 3 algorithms.

Key Result

Proposition 2.1

The undirected sub-graph containing undirected edges of the CPDAG of a polytree forms a forest. All equivalent DAGs can be obtained by orienting each undirected tree of the forest into a rooted tree, that is, by selecting any node as the root and setting all edges going away from it.

Figures (5)

  • Figure 1: Performance on the polytree simulated data at $p=100$ and the maximum in-degree $d_{*}=10$. The results from the algorithms are represented by solid lines and dot markers (polytree), dash lines and triangle markers (hill climbing), solid lines and square markers (PC), and dash-dot lines and square markers (PC early stopped). Colors correspond to three different values of $\rho_{\min}$. The rest of the SEM parameters are $\rho_{\max}=0.8$, and $\omega_{\min}=0.1$. Panels A,C show the FDR (the smaller the better) for skeleton and CPDAG recovery. Panels B,D show the Jaccard Index (the larger the better). For each combination of SEM parameters, we randomly generate a polytree, the detailed generation of the $\beta_{ij}$'s and $\omega_{ii}$'s are described in \ref{['sec:supp_simulated_polytree']}. Then we draw iid samples from the SEM of different sizes (the x-axis, $n=50,100,200,400,600,800,1000$). This entire process is repeated 100 times. Each point on the curves shows the average over the 100 repeats and the error bars are 1.96 times the standard error of the mean (many are smaller than the marker).
  • Figure 2: Same as \ref{['fig:simulated_polytree_1']} but for a maximum in-degree of $d_{*}=20$.
  • Figure 3: Comparing the true CPDAG of the ALARM data and the inferred one from the four algorithms at $n=5000$. There are 37 nodes and 46 edges in the true CPDAG.
  • Figure 4: The true CPDAG and the typical inferred CPDAG for the ASIA data with $n=5000$ samples. We plot the most likely inferred graph across 1000 bootstraps for each algorithm, which occurs at 23% (Chow-Liu), 44% (hill climbing), 42% (PC), 50% (early-stopping PC), respectively.
  • Figure 5: The true CPDAG and the most frequently inferred CPDAG for the EARTHQUAKE data with $n=2000$ samples over 1000 trials. The graph shown occurs at 90% for Chow-Liu, 47% for hill climbing, 46% for PC, and 41% for early-stopping PC, respectively.

Theorems & Definitions (40)

  • Proposition 2.1
  • proof
  • Definition 1: Chow-Liu tree associated to pairwise sample correlations
  • Lemma 3.1
  • Remark 1
  • Remark 2
  • Definition 2
  • Corollary 3.2
  • Remark 3
  • Lemma 3.3
  • ...and 30 more