Table of Contents
Fetching ...

$ψ$DAG: Projected Stochastic Approximation Iteration for DAG Structure Learning

Klea Ziu, Slavomír Hanzely, Loka Li, Kun Zhang, Martin Takáč, Dmitry Kamzolov

TL;DR

A novel framework for learning DAGs is presented, employing a Stochastic Approximation approach integrated with Stochastic Gradient Descent (SGD)-based optimization techniques, and introduces new projection methods tailored to efficiently enforce DAG constraints, ensuring that the algorithm converges to a feasible local minimum.

Abstract

Learning the structure of Directed Acyclic Graphs (DAGs) presents a significant challenge due to the vast combinatorial search space of possible graphs, which scales exponentially with the number of nodes. Recent advancements have redefined this problem as a continuous optimization task by incorporating differentiable acyclicity constraints. These methods commonly rely on algebraic characterizations of DAGs, such as matrix exponentials, to enable the use of gradient-based optimization techniques. Despite these innovations, existing methods often face optimization difficulties due to the highly non-convex nature of DAG constraints and the per-iteration computational complexity. In this work, we present a novel framework for learning DAGs, employing a Stochastic Approximation approach integrated with Stochastic Gradient Descent (SGD)-based optimization techniques. Our framework introduces new projection methods tailored to efficiently enforce DAG constraints, ensuring that the algorithm converges to a feasible local minimum. With its low iteration complexity, the proposed method is well-suited for handling large-scale problems with improved computational efficiency. We demonstrate the effectiveness and scalability of our framework through comprehensive experimental evaluations, which confirm its superior performance across various settings.

$ψ$DAG: Projected Stochastic Approximation Iteration for DAG Structure Learning

TL;DR

A novel framework for learning DAGs is presented, employing a Stochastic Approximation approach integrated with Stochastic Gradient Descent (SGD)-based optimization techniques, and introduces new projection methods tailored to efficiently enforce DAG constraints, ensuring that the algorithm converges to a feasible local minimum.

Abstract

Learning the structure of Directed Acyclic Graphs (DAGs) presents a significant challenge due to the vast combinatorial search space of possible graphs, which scales exponentially with the number of nodes. Recent advancements have redefined this problem as a continuous optimization task by incorporating differentiable acyclicity constraints. These methods commonly rely on algebraic characterizations of DAGs, such as matrix exponentials, to enable the use of gradient-based optimization techniques. Despite these innovations, existing methods often face optimization difficulties due to the highly non-convex nature of DAG constraints and the per-iteration computational complexity. In this work, we present a novel framework for learning DAGs, employing a Stochastic Approximation approach integrated with Stochastic Gradient Descent (SGD)-based optimization techniques. Our framework introduces new projection methods tailored to efficiently enforce DAG constraints, ensuring that the algorithm converges to a feasible local minimum. With its low iteration complexity, the proposed method is well-suited for handling large-scale problems with improved computational efficiency. We demonstrate the effectiveness and scalability of our framework through comprehensive experimental evaluations, which confirm its superior performance across various settings.

Paper Structure

This paper contains 27 sections, 1 theorem, 15 equations, 17 figures, 1 table, 3 algorithms.

Key Result

Theorem 2

For an $L_1$-smooth function $F(\mathbf W) = \mathbb{E}_{x\sim \mathcal{D}} \left[ l(\mathbf W;x)\right]$, Algorithm alg:fr, with access to $\sigma_1$-stochastic gradients, converges to a local minimum of problem eq:objective at the rate where $T$ is a number of SGD-type steps.

Figures (17)

  • Figure 1: Minimization of \ref{['eq:objective']} using SGD over a fixed topological ordering of vertices on graph type ER4 with $d=100$ vertices with Gaussian noise. Plots demonstrate that minimizing \ref{['eq:objective']} over a fixed random vertex ordering does not approach the true solution of \ref{['eq:objective']}.
  • Figure 2: Linear SEM methods of $\psi{}$ DAG, GOLEM and DAGMA on graphs of type ER4 with $d =1000$ number of nodes and with different noise distributions: Gaussian (first), exponential (second), and Gumbel (third).
  • Figure 3: Runtime (hours) of $\psi{}$ DAG, GOLEM and DAGMA for ER2 and ER4 graph types with small number of nodes $d = \{10,50,100, 500, 1000\}$. Noise distributions vary in different columns: Gaussian (first), exponential (second), and Gumbel (third). Method $\psi{}$ DAG showcases much better scalability when the number of nodes increases.
  • Figure 4: Runtime (hours) of $\psi{}$ DAG, GOLEM, and DAGMA for different graph types as the graph size increases. The noise distribution is always Gaussian. \ref{['sfig_er2']} extends \ref{['fig:gumb_exp_er2']} to a large number of nodes $d\in \{ 3000, 5000, 10000\}$, \ref{['sfig_sf2']} presents graph type SF2 and \ref{['sfig_er6']} showcases a more dense graph structure. Method $\psi{}$ DAG demonstrates much better scalability as the number of nodes increases. In several scenarios, both GOLEM and DAGMA failed to consistently meet the stopping criterion. For ER6 graphs with $d=100$ nodes, GOLEM failed to converge in two out of three runs, while DAGMA failed once. Additionally, DAGMA failed to converge in one out of three runs for $d=1000$. All non-converging runs were excluded from the figures.
  • Figure 5: Linear SEM methods on graphs of type ER2 with different noise distributions: Gaussian (first), exponential (second), Gumbel (third).
  • ...and 12 more figures

Theorems & Definitions (1)

  • Theorem 2