Structure Learning with Continuous Optimization: A Sober Look and Beyond
Ignavier Ng, Biwei Huang, Kun Zhang
TL;DR
This paper critically examines continuous optimization approaches for DAG structure learning, focusing on NOTEARS and GOLEM under both equal and non-equal noise variances. It challenges the claim that varsortability alone explains their success, showing that the EV case fails beyond two variables and that NV is heavily affected by nonconvexity and initialization, with data standardization potentially degrading performance. The authors analyze alternatives and empirical evidence across linear and nonlinear settings, highlighting the central roles of nonconvexity, thresholding, and sparsity penalties in shaping outcomes. They propose directions such as embracing non-equal noise formulations, adaptive thresholding, and SCAD/MCP penalties to improve reliability and broaden empirical evaluation. Overall, the work urges a careful reassessment of when and how continuous structure learning methods are applied in practice, and suggests concrete avenues to bolster their robustness and generality.
Abstract
This paper investigates in which cases continuous optimization for directed acyclic graph (DAG) structure learning can and cannot perform well and why this happens, and suggests possible directions to make the search procedure more reliable. Reisach et al. (2021) suggested that the remarkable performance of several continuous structure learning approaches is primarily driven by a high agreement between the order of increasing marginal variances and the topological order, and demonstrated that these approaches do not perform well after data standardization. We analyze this phenomenon for continuous approaches assuming equal and non-equal noise variances, and show that the statement may not hold in either case by providing counterexamples, justifications, and possible alternative explanations. We further demonstrate that nonconvexity may be a main concern especially for the non-equal noise variances formulation, while recent advances in continuous structure learning fail to achieve improvement in this case. Our findings suggest that future works should take into account the non-equal noise variances formulation to handle more general settings and for a more comprehensive empirical evaluation. Lastly, we provide insights into other aspects of the search procedure, including thresholding and sparsity, and show that they play an important role in the final solutions.
