Addressing pitfalls in implicit unobserved confounding synthesis using explicit block hierarchical ancestral sampling
Xudong Sun, Alex Markham, Pratik Misra, Carsten Marr
TL;DR
This work identifies and analyzes critical pitfalls in implicit unobserved confounding synthesis, notably the restricted spectrum from diagonally dominant constructions of the idiosyncratic covariance Ω and the limited bidirected-edge structures in ancestral ADMG generation. It then introduces an explicit, block-hierarchical confounding synthesis approach that generates a ground-truth DAG, hides selected variables, and converts the result into an ancestral graph for evaluation, thereby ensuring broader coverage of the causal-model space. The explicit formulation shows that Ω can be expressed as Ω = Λ E(ξ ξ^T) Λ^T + E(ε_O ε_O^T), which, with suitable constraints, spans the space of symmetric positive definite matrices and connects to the implicit parameterization, enabling robust comparisons of causal-discovery methods. The proposed protocol supports heterogeneous graph structures, scalable ancestral sampling (including Wishart-based weight sampling), and principled DAG-to-ancestral-graph transformation, improving realism and diversity in synthetic benchmarks and providing a principled bridge between implicit and explicit confounding parameterizations.
Abstract
Unbiased data synthesis is crucial for evaluating causal discovery algorithms in the presence of unobserved confounding, given the scarcity of real-world datasets. A common approach, implicit parameterization, encodes unobserved confounding by modifying the off-diagonal entries of the idiosyncratic covariance matrix while preserving positive definiteness. Within this approach, we identify that state-of-the-art protocols have two distinct issues that hinder unbiased sampling from the complete space of causal models: first, we give a detailed analysis of use of diagonally dominant constructions restricts the spectrum of partial correlation matrices; and second, the restriction of possible graphical structures when sampling bidirected edges, unnecessarily ruling out valid causal models. To address these limitations, we propose an improved explicit modeling approach for unobserved confounding, leveraging block-hierarchical ancestral generation of ground truth causal graphs. Algorithms for converting the ground truth DAG into ancestral graph is provided so that the output of causal discovery algorithms could be compared with. We draw connections between implicit and explicit parameterization, prove that our approach fully covers the space of causal models, including those generated by the implicit parameterization, thus enabling more robust evaluation of methods for causal discovery and inference.
