CoLiDE: Concomitant Linear DAG Estimation

Seyed Saman Saboksayr; Gonzalo Mateos; Mariano Tepper

CoLiDE: Concomitant Linear DAG Estimation

Seyed Saman Saboksayr, Gonzalo Mateos, Mariano Tepper

TL;DR

A new convex score function is proposed for sparsity-aware learning of linear DAGs, which incorporates concomitant estimation of scale and thus effectively decouples the sparsity parameter from the exogenous noise levels.

Abstract

We deal with the combinatorial problem of learning directed acyclic graph (DAG) structure from observational data adhering to a linear structural equation model (SEM). Leveraging advances in differentiable, nonconvex characterizations of acyclicity, recent efforts have advocated a continuous constrained optimization paradigm to efficiently explore the space of DAGs. Most existing methods employ lasso-type score functions to guide this search, which (i) require expensive penalty parameter retuning when the $\textit{unknown}$ SEM noise variances change across problem instances; and (ii) implicitly rely on limiting homoscedasticity assumptions. In this work, we propose a new convex score function for sparsity-aware learning of linear DAGs, which incorporates concomitant estimation of scale and thus effectively decouples the sparsity parameter from the exogenous noise levels. Regularization via a smooth, nonconvex acyclicity penalty term yields CoLiDE ($\textbf{Co}$ncomitant $\textbf{Li}$near $\textbf{D}$AG $\textbf{E}$stimation), a regression-based criterion amenable to efficient gradient computation and closed-form estimation of noise variances in heteroscedastic scenarios. Our algorithm outperforms state-of-the-art methods without incurring added complexity, especially when the DAGs are larger and the noise level profile is heterogeneous. We also find CoLiDE exhibits enhanced stability manifested via reduced standard deviations in several domain-specific metrics, underscoring the robustness of our novel linear DAG estimator.

CoLiDE: Concomitant Linear DAG Estimation

TL;DR

Abstract

SEM noise variances change across problem instances; and (ii) implicitly rely on limiting homoscedasticity assumptions. In this work, we propose a new convex score function for sparsity-aware learning of linear DAGs, which incorporates concomitant estimation of scale and thus effectively decouples the sparsity parameter from the exogenous noise levels. Regularization via a smooth, nonconvex acyclicity penalty term yields CoLiDE (

ncomitant

near

stimation), a regression-based criterion amenable to efficient gradient computation and closed-form estimation of noise variances in heteroscedastic scenarios. Our algorithm outperforms state-of-the-art methods without incurring added complexity, especially when the DAGs are larger and the noise level profile is heterogeneous. We also find CoLiDE exhibits enhanced stability manifested via reduced standard deviations in several domain-specific metrics, underscoring the robustness of our novel linear DAG estimator.

Paper Structure (30 sections, 28 equations, 17 figures, 4 tables, 1 algorithm)

This paper contains 30 sections, 28 equations, 17 figures, 4 tables, 1 algorithm.

Introduction
Preliminaries and Problem Statement
Related work
Concomitant Linear DAG Estimation
Further algorithmic details
Experimental Results
Concluding Summary, Limitations, and Future Work
Additional related work
Background on smoothed concomitant lasso estimators
Algorithmic derivations
Equal noise variance
Non-equal noise variance
Gradient of the log-determinant acyclicity function
Guarantees for (heteroscedastic) linear Gaussian SEMs
Implementation details
...and 15 more sections

Figures (17)

Figure 1: Mean DAG recovery performance, plus/minus one standard deviation, is evaluated for ER4 (top row) and SF4 (bottom row) graphs, each with 200 nodes, assuming equal noise variances. Each column corresponds to a different noise distribution.
Figure 2: Mean DAG recovery performance, plus/minus one standard deviation, under heteroscedastic noise for both ER4 (top row) and SF4 (bottom row) graphs with varying numbers of nodes. Each column corresponds to a different noise distribution.
Figure 3: Mean relative noise estimation errors, plus/minus one standard deviation, as a function of the number of samples, aggregated from ten separate ER4 graphs, each comprising 200 nodes.
Figure 4: Tracking performance of mini-batch stochastic gradient descent in relation to the output of the original CoLiDE-EV algorithm. The left plot illustrates the tracking of the output graph ${\mathbf W}^{\star}$, while the right plot represents the tracking of the noise level $\sigma^{\star}$.
Figure 5: DAG recovery performance assessed for ER4 and SF4 graphs with 200 nodes, assuming equal noise variances. Each row represents a distinct noise distribution, and the shaded area depicts the standard deviation. The first two columns display SID, while the remaining columns focus on FDR.
...and 12 more figures

Theorems & Definitions (1)

Remark

CoLiDE: Concomitant Linear DAG Estimation

TL;DR

Abstract

CoLiDE: Concomitant Linear DAG Estimation

Authors

TL;DR

Abstract

Table of Contents

Figures (17)

Theorems & Definitions (1)