Efficient Latent Variable Causal Discovery: Combining Score Search and Targeted Testing
Joseph Ramsey, Bryan Andrews, Peter Spirtes
TL;DR
The paper tackles latent-variable causal discovery under selection bias, where standard CI-based methods like FCI suffer from repeated testing and instability under near-unfaithfulness. It introduces a spectrum of methods: BOSS-FCI and GRaSP-FCI as score-guided hybrids within the GFCI framework, LV-Dumb as a fast structural baseline, and FCIT as a targeted-testing refinement that uses recursive path blocking to identify separating sets with far fewer CI tests while guaranteeing well-formed PAGs. Theoretical guarantees cover correctness, edge-minimality, and orientation soundness (with optional full completeness if all Zhang rules are applied), and empirically FCIT achieves best balance of precision, efficiency, and structural validity across simulations and real data; LV-Dumb offers a fast, practical alternative, while BOSS-FCI/GRaSP-FCI provide robust baselines. Together, these methods advance scalable, reliable causal discovery in the presence of latent confounding, with practical implications for large-scale datasets and diverse domains.
Abstract
Learning causal structure from observational data is especially challenging when latent variables or selection bias are present. The Fast Causal Inference (FCI) algorithm addresses this setting but performs exhaustive conditional independence tests across many subsets, often leading to spurious independences, missing or extra edges, and unreliable orientations. We present a family of score-guided mixed-strategy causal search algorithms that extend this framework. First, we introduce BOSS-FCI and GRaSP-FCI, variants of GFCI (Greedy Fast Causal Inference) that substitute BOSS (Best Order Score Search) or GRaSP (Greedy Relaxations of Sparsest Permutation) for FGES (Fast Greedy Equivalence Search), preserving correctness while trading off scalability and conservativeness. Second, we develop FCI Targeted-Testing (FCIT), a novel hybrid method that replaces exhaustive testing with targeted, score-informed tests guided by BOSS. FCIT guarantees well-formed PAGs and achieves higher precision and efficiency across sample sizes. Finally, we propose a lightweight heuristic, LV-Dumb (Latent Variable "Dumb"), which returns the PAG of the BOSS DAG (Directed Acyclic Graph). Though not strictly sound for latent confounding, LV-Dumb often matches FCIT's accuracy while running substantially faster. Simulations and real-data analyses show that BOSS-FCI and GRaSP-FCI provide robust baselines, FCIT yields the best balance of precision and reliability, and LV-Dumb offers a fast, near-equivalent alternative. Together, these methods demonstrate that targeted and score-guided strategies can dramatically improve the efficiency and correctness of latent-variable causal discovery.
