Differentiable Structure Learning and Causal Discovery for General Binary Data
Chang Deng, Bryon Aragam
TL;DR
This work addresses causal-structure learning for general binary data without restricting the data-generating process. It introduces a fully differentiable framework based on the multivariate Bernoulli distribution, learning a sparse, equivalence-class-consistent DAG by optimizing a cross-entropy objective over higher-order interaction features and enforcing acyclicity with a differentiable constraint. The authors establish non-identifiability of DAGs from observational data, characterize the full equivalence class of compatible graph–parameter pairs, and show that, under SMR, the sparsest graph is identifiable up to Markov equivalence; they further provide a population-level guarantee that suitable regularization yields recovery of the minimal class. To scale, they propose BiNOTEARS, a two-stage approach leveraging discrete adaptations of continuous DAG learners, achieving competitive performance on synthetic and real networks. Overall, the paper offers a principled, assumption-light pathway for learning complex causal structures in discrete data with theoretical guarantees and practical scalability.
Abstract
Existing methods for differentiable structure learning in discrete data typically assume that the data are generated from specific structural equation models. However, these assumptions may not align with the true data-generating process, which limits the general applicability of such methods. Furthermore, current approaches often ignore the complex dependence structure inherent in discrete data and consider only linear effects. We propose a differentiable structure learning framework that is capable of capturing arbitrary dependencies among discrete variables. We show that although general discrete models are unidentifiable from purely observational data, it is possible to characterize the complete set of compatible parameters and structures. Additionally, we establish identifiability up to Markov equivalence under mild assumptions. We formulate the learning problem as a single differentiable optimization task in the most general form, thereby avoiding the unrealistic simplifications adopted by previous methods. Empirical results demonstrate that our approach effectively captures complex relationships in discrete data.
