Order-based Structure Learning with Normalizing Flows
Hamidreza Kamkari, Vahid Balazadeh, Vahid Zehtab, Rahul G. Krishnan
TL;DR
The paper addresses the challenge of learning causal structure from observational data by relaxing the common additive-noise model (ANM) assumption through autoregressive normalizing flows (ANFs) and by framing structure learning as a search over topological orderings. It introduces OSLow, which uses a masked-flow ensemble to model multiple orderings and a differentiable permutation-learning objective based on a Boltzmann distribution over permutation matrices, enabling gradient-based optimization over the discrete order space. The authors prove strong identifiability results for the data complexity class of restricted location-scale noise models (LSNMs) with affine ANFs and demonstrate state-of-the-art performance on Sachs and SynTReN, including accurate interventional distribution estimation from observational data. The work demonstrates that relaxing ANM assumptions can yield practical gains in real-world causal discovery, with potential for broader applicability and extensions to more general non-linear post-flow models.
Abstract
Estimating the causal structure of observational data is a challenging combinatorial search problem that scales super-exponentially with graph size. Existing methods use continuous relaxations to make this problem computationally tractable but often restrict the data-generating process to additive noise models (ANMs) through explicit or implicit assumptions. We present Order-based Structure Learning with Normalizing Flows (OSLow), a framework that relaxes these assumptions using autoregressive normalizing flows. We leverage the insight that searching over topological orderings is a natural way to enforce acyclicity in structure discovery and propose a novel, differentiable permutation learning method to find such orderings. Through extensive experiments on synthetic and real-world data, we demonstrate that OSLow outperforms prior baselines and improves performance on the observational Sachs and SynTReN datasets as measured by structural hamming distance and structural intervention distance, highlighting the importance of relaxing the ANM assumption made by existing methods.
