Expert-Aided Causal Discovery of Ancestral Graphs
Tiago da Silva, Bruna Bazaluk, Eliezer de Souza da Silva, António Góis, Dominik Heider, Samuel Kaski, Diego Mesquita, Adèle Helena Ribeiro
TL;DR
Causal discovery under latent confounding is challenged by limited data and a lack of uncertainty quantification. The authors propose Ancestral GFlowNets (AGFN), a probabilistic method that samples ancestral graphs from a score-based belief $p(\mathcal{G}) \propto R(\mathcal{G})$, with $R(\mathcal{G})=\exp\{(U(\mathcal{G})-\mu)/\sigma\}$, and integrates an active, uncertain expert-in-the-loop to refine the inference without retraining. The framework combines Bayesian-style updating of edge-ancestry beliefs with a learned forward policy, using an acquisition function based on cross-entropy to efficiently query experts (including GPT-4o) and improve SHD/BIC metrics on synthetic data and the Sachs dataset. This approach yields uncertainty-aware CD that remains competitive with state-of-the-art methods while enabling robust incorporation of imperfect expert knowledge, offering practical impact for real-world causal analysis under hidden confounding.
Abstract
Causal discovery (CD) algorithms are notably brittle when data is scarce, inferring unreliable causal relations that may contradict expert knowledge, especially when considering latent confounders. Furthermore, the lack of uncertainty quantification in most CD methods hinders users from diagnosing and refining results. To address these issues, we introduce Ancestral GFlowNets (AGFNs). AGFN samples ancestral graphs (AGs) proportionally to a score-based belief distribution representing our epistemic uncertainty over the causal relationships. Building upon this distribution, we propose an elicitation framework for expert-driven assessment. This framework comprises an optimal experimental design to probe the expert and a scheme to incorporate the obtained feedback into AGFN. Our experiments show that: i) AGFN is competitive against other methods that address latent confounding on both synthetic and real-world datasets; and ii) our design for incorporating feedback from a (simulated) human expert or a Large Language Model (LLM) improves inference quality.
