Are Bayesian networks typically faithful?
Philip Boeken, Patrick Forré, Joris M. Mooij
TL;DR
This work investigates whether faithfulness is a typical property for Bayesian networks. It proves that faithful distributions are dense and open in the total variation topology for any Markov distribution relative to a fixed DAG $G$, with unfaithful distributions forming a nowhere dense set; it further shows that, under mild regularity, faithful parameters are dense with Lebesgue measure zero for conditional exponential-family parametrisations, extending known results beyond linear-Gaussian and discrete models. The results also extend to networks with latent variables via latent projections to ADMGs. Collectively, the findings provide a topological notion of typicality for faithfulness, offering robustness guarantees for causal discovery across broad model classes, while noting limitations due to the chosen notion of typicality and the potential separation between topological and statistical notions of faithfulness.
Abstract
Faithfulness is a ubiquitous assumption in causal inference, often motivated by the fact that the faithful parameters of linear Gaussian and discrete Bayesian networks are typical, and the folklore belief that this should also hold for other classes of Bayesian networks. We address this open question by showing that among all Bayesian networks over a given DAG, the faithful Bayesian networks are indeed `typical': they constitute a dense, open set with respect to the total variation metric. However, this does not imply that faithfulness is typical in restricted classes of Bayesian networks, as are often considered in statistical applications. To this end we consider the class of Bayesian networks parametrised by conditional exponential families, for which we show that under mild regularity conditions, the faithful parameters constitute a dense, open set and the unfaithful parameters have Lebesgue measure zero, extending the existing results for linear Gaussian and discrete Bayesian networks. Finally, we show that the aforementioned results also hold for Bayesian networks with latent variables.
