Table of Contents
Fetching ...

Relational Causal Discovery with Latent Confounders

Matteo Negro, Andrea Piras, Ragib Ahsan, David Arbour, Elena Zheleva

TL;DR

RelFCI addresses learning causal structure from relational data under latent confounding, extending causal discovery beyond i.i.d. assumptions. It introduces Latent Relational Causal Models (LRCMs) and lifted representations—MAAGG, PAAGG, and PARM—for modeling relational dependencies and their equivalence classes across perspectives, using a hop threshold $h$ (with $h' \ge 2h$ for latent paths). The RelFCI algorithm extends FCI and RCD to relational domains, proving soundness and completeness under ideal conditional independence testing and specified assumptions, and demonstrates robust performance on synthetic data with latent confounders. Collectively, the work advances relational causal discovery by incorporating latent confounding and providing a practical, scalable framework for causal effect estimation in complex relational systems.

Abstract

Estimating causal effects from real-world relational data can be challenging when the underlying causal model and potential confounders are unknown. While several causal discovery algorithms exist for learning causal models with latent confounders from data, they assume that the data is independent and identically distributed (i.i.d.) and are not well-suited for learning from relational data. Similarly, existing relational causal discovery algorithms assume causal sufficiency, which is unrealistic for many real-world datasets. To address this gap, we propose RelFCI, a sound and complete causal discovery algorithm for relational data with latent confounders. Our work builds upon the Fast Causal Inference (FCI) and Relational Causal Discovery (RCD) algorithms and it defines new graphical models, necessary to support causal discovery in relational domains. We also establish soundness and completeness guarantees for relational d-separation with latent confounders. We present experimental results demonstrating the effectiveness of RelFCI in identifying the correct causal structure in relational causal models with latent confounders.

Relational Causal Discovery with Latent Confounders

TL;DR

RelFCI addresses learning causal structure from relational data under latent confounding, extending causal discovery beyond i.i.d. assumptions. It introduces Latent Relational Causal Models (LRCMs) and lifted representations—MAAGG, PAAGG, and PARM—for modeling relational dependencies and their equivalence classes across perspectives, using a hop threshold (with for latent paths). The RelFCI algorithm extends FCI and RCD to relational domains, proving soundness and completeness under ideal conditional independence testing and specified assumptions, and demonstrates robust performance on synthetic data with latent confounders. Collectively, the work advances relational causal discovery by incorporating latent confounding and providing a practical, scalable framework for causal effect estimation in complex relational systems.

Abstract

Estimating causal effects from real-world relational data can be challenging when the underlying causal model and potential confounders are unknown. While several causal discovery algorithms exist for learning causal models with latent confounders from data, they assume that the data is independent and identically distributed (i.i.d.) and are not well-suited for learning from relational data. Similarly, existing relational causal discovery algorithms assume causal sufficiency, which is unrealistic for many real-world datasets. To address this gap, we propose RelFCI, a sound and complete causal discovery algorithm for relational data with latent confounders. Our work builds upon the Fast Causal Inference (FCI) and Relational Causal Discovery (RCD) algorithms and it defines new graphical models, necessary to support causal discovery in relational domains. We also establish soundness and completeness guarantees for relational d-separation with latent confounders. We present experimental results demonstrating the effectiveness of RelFCI in identifying the correct causal structure in relational causal models with latent confounders.

Paper Structure

This paper contains 30 sections, 15 theorems, 2 equations, 19 figures, 5 algorithms.

Key Result

Lemma 1

Given a relational causal model structure $\mathcal{M}$ and perspective $\mathcal{B}$, an abstract ground graph $AGG_{\mathcal{M}\mathcal{B}}$ is ancestral if and only if all ground graphs $GG_{\mathcal{M}\sigma}$, with skeleton $\sigma\in\sum_\mathcal{S}$, are ancestral.

Figures (19)

  • Figure 1: Example of a Relational Causal Model
  • Figure 2: Counterexample that shows RCD does not produce the correct output AGG for LRCM with a faithful oracle
  • Figure 3: New representations to enable relational causal discovery in LRCMs
  • Figure 4: RelFCI and RCD Precision and Recall comparison. Results are combined for both 1 and 2 latent variables. Intervals represent $\pm1$ standard deviation.
  • Figure 5: RelFCI and RCD Precision and Recall performance with 1 and 2 latent variables.
  • ...and 14 more figures

Theorems & Definitions (29)

  • Definition 1: Latent Relational Variable
  • Definition 2: Latent Relational Causal Model (LRCM)
  • Definition 3: Latent Abstract Ground Graph (LAGG)
  • Definition 4: Maximal Ancestral Abstract Ground Graph (MAAGG)
  • Lemma 1
  • Definition 5: Partial Ancestral Abstract Ground Graph (PAAGG)
  • Proposition 1
  • Definition 6: Partial Ancestral Relational Model (PARM)
  • Proposition 2
  • Theorem 1
  • ...and 19 more