Table of Contents
Fetching ...

Structure Learning with Continuous Optimization: A Sober Look and Beyond

Ignavier Ng, Biwei Huang, Kun Zhang

TL;DR

This paper critically examines continuous optimization approaches for DAG structure learning, focusing on NOTEARS and GOLEM under both equal and non-equal noise variances. It challenges the claim that varsortability alone explains their success, showing that the EV case fails beyond two variables and that NV is heavily affected by nonconvexity and initialization, with data standardization potentially degrading performance. The authors analyze alternatives and empirical evidence across linear and nonlinear settings, highlighting the central roles of nonconvexity, thresholding, and sparsity penalties in shaping outcomes. They propose directions such as embracing non-equal noise formulations, adaptive thresholding, and SCAD/MCP penalties to improve reliability and broaden empirical evaluation. Overall, the work urges a careful reassessment of when and how continuous structure learning methods are applied in practice, and suggests concrete avenues to bolster their robustness and generality.

Abstract

This paper investigates in which cases continuous optimization for directed acyclic graph (DAG) structure learning can and cannot perform well and why this happens, and suggests possible directions to make the search procedure more reliable. Reisach et al. (2021) suggested that the remarkable performance of several continuous structure learning approaches is primarily driven by a high agreement between the order of increasing marginal variances and the topological order, and demonstrated that these approaches do not perform well after data standardization. We analyze this phenomenon for continuous approaches assuming equal and non-equal noise variances, and show that the statement may not hold in either case by providing counterexamples, justifications, and possible alternative explanations. We further demonstrate that nonconvexity may be a main concern especially for the non-equal noise variances formulation, while recent advances in continuous structure learning fail to achieve improvement in this case. Our findings suggest that future works should take into account the non-equal noise variances formulation to handle more general settings and for a more comprehensive empirical evaluation. Lastly, we provide insights into other aspects of the search procedure, including thresholding and sparsity, and show that they play an important role in the final solutions.

Structure Learning with Continuous Optimization: A Sober Look and Beyond

TL;DR

This paper critically examines continuous optimization approaches for DAG structure learning, focusing on NOTEARS and GOLEM under both equal and non-equal noise variances. It challenges the claim that varsortability alone explains their success, showing that the EV case fails beyond two variables and that NV is heavily affected by nonconvexity and initialization, with data standardization potentially degrading performance. The authors analyze alternatives and empirical evidence across linear and nonlinear settings, highlighting the central roles of nonconvexity, thresholding, and sparsity penalties in shaping outcomes. They propose directions such as embracing non-equal noise formulations, adaptive thresholding, and SCAD/MCP penalties to improve reliability and broaden empirical evaluation. Overall, the work urges a careful reassessment of when and how continuous structure learning methods are applied in practice, and suggests concrete avenues to bolster their robustness and generality.

Abstract

This paper investigates in which cases continuous optimization for directed acyclic graph (DAG) structure learning can and cannot perform well and why this happens, and suggests possible directions to make the search procedure more reliable. Reisach et al. (2021) suggested that the remarkable performance of several continuous structure learning approaches is primarily driven by a high agreement between the order of increasing marginal variances and the topological order, and demonstrated that these approaches do not perform well after data standardization. We analyze this phenomenon for continuous approaches assuming equal and non-equal noise variances, and show that the statement may not hold in either case by providing counterexamples, justifications, and possible alternative explanations. We further demonstrate that nonconvexity may be a main concern especially for the non-equal noise variances formulation, while recent advances in continuous structure learning fail to achieve improvement in this case. Our findings suggest that future works should take into account the non-equal noise variances formulation to handle more general settings and for a more comprehensive empirical evaluation. Lastly, we provide insights into other aspects of the search procedure, including thresholding and sparsity, and show that they play an important role in the final solutions.
Paper Structure (57 sections, 4 theorems, 31 equations, 22 figures, 2 tables)

This paper contains 57 sections, 4 theorems, 31 equations, 22 figures, 2 tables.

Key Result

proposition 1

Consider the parameters $(B,\Omega)$ of the linear SEMs over variables $X=(X_1,\dots,X_d)$, $d\geq 3$, where the noise variables follow Gaussian distributions. In the large sample limit, the set of parameters such that varsortabiltiy equals one and that the true DAG does not have the lowest least sq

Figures (22)

  • Figure 1: Examples of triangle structures.
  • Figure 2: Noise ratio after data standardization.
  • Figure 3: Linear Gaussian-EV formulation without standardization.
  • Figure 4: Linear Gaussian-NV formulation without standardization.
  • Figure 5: Linear Gaussian-NV formulation with standardization.
  • ...and 17 more figures

Theorems & Definitions (4)

  • proposition 1
  • proposition 2
  • lemma 1
  • lemma 2