Learning Gaussian DAG Models without Condition Number Bounds

Constantinos Daskalakis; Vardis Kandiros; Rui Yao

Learning Gaussian DAG Models without Condition Number Bounds

Constantinos Daskalakis, Vardis Kandiros, Rui Yao

TL;DR

This work proves that learning the topology of linear Gaussian DAGs under equal-variance noise can be information-theoretically and computationally feasible without dependence on the covariance condition number. It introduces a graph-specific quantity $\tau(G)$ that governs sample complexity, achieving $m \asymp O\bigl(\max(b_{\min}^{-4}, \tau b_{\min}^{-2}) d\log(n/d)\bigr)$ samples for information-theoretic recovery, and shows a near-tight lower bound up to a factor of $d$. An efficient, polynomial-time algorithm under a bounded-variance assumption uses LASSO-based methods with a sample complexity of $m \asymp O\bigl(R^2 \tau^3 d^4 b_{\min}^{-4} \log(n)\bigr)$, highlighting a practical route when variances are bounded. Together, these results separate directed and undirected Gaussian learning in terms of the governing complexity terms and demonstrate the practical impact of a condition-number-free approach on high-dimensional causal structure learning.

Abstract

We study the problem of learning the topology of a directed Gaussian Graphical Model under the equal-variance assumption, where the graph has $n$ nodes and maximum in-degree $d$. Prior work has established that $O(d \log n)$ samples are sufficient for this task. However, an important factor that is often overlooked in these analyses is the dependence on the condition number of the covariance matrix of the model. Indeed, all algorithms from prior work require a number of samples that grows polynomially with this condition number. In many cases this is unsatisfactory, since the condition number could grow polynomially with $n$, rendering these prior approaches impractical in high-dimensional settings. In this work, we provide an algorithm that recovers the underlying graph and prove that the number of samples required is independent of the condition number. Furthermore, we establish lower bounds that nearly match the upper bound up to a $d$-factor, thus providing an almost tight characterization of the true sample complexity of the problem. Moreover, under a further assumption that all the variances of the variables are bounded, we design a polynomial-time algorithm that recovers the underlying graph, at the cost of an additional polynomial dependence of the sample complexity on $d$. We complement our theoretical findings with simulations on synthetic datasets that confirm our predictions.

Learning Gaussian DAG Models without Condition Number Bounds

TL;DR

that governs sample complexity, achieving

samples for information-theoretic recovery, and shows a near-tight lower bound up to a factor of

. An efficient, polynomial-time algorithm under a bounded-variance assumption uses LASSO-based methods with a sample complexity of

, highlighting a practical route when variances are bounded. Together, these results separate directed and undirected Gaussian learning in terms of the governing complexity terms and demonstrate the practical impact of a condition-number-free approach on high-dimensional causal structure learning.

Abstract

We study the problem of learning the topology of a directed Gaussian Graphical Model under the equal-variance assumption, where the graph has

nodes and maximum in-degree

. Prior work has established that

samples are sufficient for this task. However, an important factor that is often overlooked in these analyses is the dependence on the condition number of the covariance matrix of the model. Indeed, all algorithms from prior work require a number of samples that grows polynomially with this condition number. In many cases this is unsatisfactory, since the condition number could grow polynomially with

, rendering these prior approaches impractical in high-dimensional settings. In this work, we provide an algorithm that recovers the underlying graph and prove that the number of samples required is independent of the condition number. Furthermore, we establish lower bounds that nearly match the upper bound up to a

-factor, thus providing an almost tight characterization of the true sample complexity of the problem. Moreover, under a further assumption that all the variances of the variables are bounded, we design a polynomial-time algorithm that recovers the underlying graph, at the cost of an additional polynomial dependence of the sample complexity on

. We complement our theoretical findings with simulations on synthetic datasets that confirm our predictions.

Learning Gaussian DAG Models without Condition Number Bounds

TL;DR

Abstract

Learning Gaussian DAG Models without Condition Number Bounds

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (27)