Table of Contents
Fetching ...

CoLiDE: Concomitant Linear DAG Estimation

Seyed Saman Saboksayr, Gonzalo Mateos, Mariano Tepper

TL;DR

A new convex score function is proposed for sparsity-aware learning of linear DAGs, which incorporates concomitant estimation of scale and thus effectively decouples the sparsity parameter from the exogenous noise levels.

Abstract

We deal with the combinatorial problem of learning directed acyclic graph (DAG) structure from observational data adhering to a linear structural equation model (SEM). Leveraging advances in differentiable, nonconvex characterizations of acyclicity, recent efforts have advocated a continuous constrained optimization paradigm to efficiently explore the space of DAGs. Most existing methods employ lasso-type score functions to guide this search, which (i) require expensive penalty parameter retuning when the $\textit{unknown}$ SEM noise variances change across problem instances; and (ii) implicitly rely on limiting homoscedasticity assumptions. In this work, we propose a new convex score function for sparsity-aware learning of linear DAGs, which incorporates concomitant estimation of scale and thus effectively decouples the sparsity parameter from the exogenous noise levels. Regularization via a smooth, nonconvex acyclicity penalty term yields CoLiDE ($\textbf{Co}$ncomitant $\textbf{Li}$near $\textbf{D}$AG $\textbf{E}$stimation), a regression-based criterion amenable to efficient gradient computation and closed-form estimation of noise variances in heteroscedastic scenarios. Our algorithm outperforms state-of-the-art methods without incurring added complexity, especially when the DAGs are larger and the noise level profile is heterogeneous. We also find CoLiDE exhibits enhanced stability manifested via reduced standard deviations in several domain-specific metrics, underscoring the robustness of our novel linear DAG estimator.

CoLiDE: Concomitant Linear DAG Estimation

TL;DR

A new convex score function is proposed for sparsity-aware learning of linear DAGs, which incorporates concomitant estimation of scale and thus effectively decouples the sparsity parameter from the exogenous noise levels.

Abstract

We deal with the combinatorial problem of learning directed acyclic graph (DAG) structure from observational data adhering to a linear structural equation model (SEM). Leveraging advances in differentiable, nonconvex characterizations of acyclicity, recent efforts have advocated a continuous constrained optimization paradigm to efficiently explore the space of DAGs. Most existing methods employ lasso-type score functions to guide this search, which (i) require expensive penalty parameter retuning when the SEM noise variances change across problem instances; and (ii) implicitly rely on limiting homoscedasticity assumptions. In this work, we propose a new convex score function for sparsity-aware learning of linear DAGs, which incorporates concomitant estimation of scale and thus effectively decouples the sparsity parameter from the exogenous noise levels. Regularization via a smooth, nonconvex acyclicity penalty term yields CoLiDE (ncomitant near AG stimation), a regression-based criterion amenable to efficient gradient computation and closed-form estimation of noise variances in heteroscedastic scenarios. Our algorithm outperforms state-of-the-art methods without incurring added complexity, especially when the DAGs are larger and the noise level profile is heterogeneous. We also find CoLiDE exhibits enhanced stability manifested via reduced standard deviations in several domain-specific metrics, underscoring the robustness of our novel linear DAG estimator.
Paper Structure (30 sections, 28 equations, 17 figures, 4 tables, 1 algorithm)

This paper contains 30 sections, 28 equations, 17 figures, 4 tables, 1 algorithm.

Figures (17)

  • Figure 1: Mean DAG recovery performance, plus/minus one standard deviation, is evaluated for ER4 (top row) and SF4 (bottom row) graphs, each with 200 nodes, assuming equal noise variances. Each column corresponds to a different noise distribution.
  • Figure 2: Mean DAG recovery performance, plus/minus one standard deviation, under heteroscedastic noise for both ER4 (top row) and SF4 (bottom row) graphs with varying numbers of nodes. Each column corresponds to a different noise distribution.
  • Figure 3: Mean relative noise estimation errors, plus/minus one standard deviation, as a function of the number of samples, aggregated from ten separate ER4 graphs, each comprising 200 nodes.
  • Figure 4: Tracking performance of mini-batch stochastic gradient descent in relation to the output of the original CoLiDE-EV algorithm. The left plot illustrates the tracking of the output graph ${\mathbf W}^{\star}$, while the right plot represents the tracking of the noise level $\sigma^{\star}$.
  • Figure 5: DAG recovery performance assessed for ER4 and SF4 graphs with 200 nodes, assuming equal noise variances. Each row represents a distinct noise distribution, and the shaded area depicts the standard deviation. The first two columns display SID, while the remaining columns focus on FDR.
  • ...and 12 more figures

Theorems & Definitions (1)

  • Remark