Table of Contents
Fetching ...

Constraint-based causal discovery with tiered background knowledge and latent variables in single or overlapping datasets

Christine W. Bang, Vanessa Didelez

TL;DR

This paper extends constraint-based causal discovery to settings with latent variables and multiple overlapping datasets by introducing tiered background knowledge. It formalizes tiered ordering and presents two algorithms, tFCI and tIOD, with simple and full variants, and proves soundness (and completeness for the simple versions) under oracle conditions. The work shows that leveraging tiered knowledge can substantially improve identifiability, reduce computation, and yield more informative outputs, with practical relevance for multi-cohort and longitudinal studies. It also discusses robustness in finite samples and outlines directions for extending these ideas to other FCI variants and time-series contexts.

Abstract

In this paper we consider the use of tiered background knowledge within constraint based causal discovery. Our focus is on settings relaxing causal sufficiency, i.e. allowing for latent variables which may arise because relevant information could not be measured at all, or not jointly, as in the case of multiple overlapping datasets. We first present novel insights into the properties of the 'tiered FCI' (tFCI) algorithm. Building on this, we introduce a new extension of the IOD (integrating overlapping datasets) algorithm incorporating tiered background knowledge, the 'tiered IOD' (tIOD) algorithm. We show that under full usage of the tiered background knowledge tFCI and tIOD are sound, while simple versions of the tIOD and tFCI are sound and complete. We further show that the tIOD algorithm can often be expected to be considerably more efficient and informative than the IOD algorithm even beyond the obvious restriction of the Markov equivalence classes. We provide a formal result on the conditions for this gain in efficiency and informativeness. Our results are accompanied by a series of examples illustrating the exact role and usefulness of tiered background knowledge.

Constraint-based causal discovery with tiered background knowledge and latent variables in single or overlapping datasets

TL;DR

This paper extends constraint-based causal discovery to settings with latent variables and multiple overlapping datasets by introducing tiered background knowledge. It formalizes tiered ordering and presents two algorithms, tFCI and tIOD, with simple and full variants, and proves soundness (and completeness for the simple versions) under oracle conditions. The work shows that leveraging tiered knowledge can substantially improve identifiability, reduce computation, and yield more informative outputs, with practical relevance for multi-cohort and longitudinal studies. It also discusses robustness in finite samples and outlines directions for extending these ideas to other FCI variants and time-series contexts.

Abstract

In this paper we consider the use of tiered background knowledge within constraint based causal discovery. Our focus is on settings relaxing causal sufficiency, i.e. allowing for latent variables which may arise because relevant information could not be measured at all, or not jointly, as in the case of multiple overlapping datasets. We first present novel insights into the properties of the 'tiered FCI' (tFCI) algorithm. Building on this, we introduce a new extension of the IOD (integrating overlapping datasets) algorithm incorporating tiered background knowledge, the 'tiered IOD' (tIOD) algorithm. We show that under full usage of the tiered background knowledge tFCI and tIOD are sound, while simple versions of the tIOD and tFCI are sound and complete. We further show that the tIOD algorithm can often be expected to be considerably more efficient and informative than the IOD algorithm even beyond the obvious restriction of the Markov equivalence classes. We provide a formal result on the conditions for this gain in efficiency and informativeness. Our results are accompanied by a series of examples illustrating the exact role and usefulness of tiered background knowledge.

Paper Structure

This paper contains 23 sections, 15 theorems, 16 equations, 6 figures, 2 tables, 4 algorithms.

Key Result

proposition 1

Let $\mathcal{G}=(\mathbf{V},\mathbf{E})$ be a DAG or MAG, $\tau$ a tiered ordering of $\mathbf{V}$ consistent with $\mathcal{G}$, and let $A,B\in\mathbf{V}$ be two distinct nodes. Then $A$ and $B$ are separated in $\mathcal{G}$ by some subset of $\mathbf{V}$ if and only if they are separated by a s

Figures (6)

  • Figure 1: Toy example of how four different cohort studies can overlap in time and variables.
  • Figure 2: Examples of graphs visited by the IOD algorithm. Here, (e) is the PAG of (a), but more graphs may encode the marginal independence models learned from dataset 1 and dataset 2. Given oracle knowledge of (a) as input, the algorithm considers all graphs where combinations of the edges $A\circ\mkern-6.5mu-\mkern-6.5mu\circ D$, $A\circ\mkern-6.5mu-\mkern-6.5mu\circ B$, $A\circ\mkern-6.5mu-\mkern-6.5mu\circ C$, $B\circ\mkern-6.5mu-\mkern-6.5mu\circ D$ and $C\circ\mkern-6.5mu-\mkern-6.5mu\circ D$ are removed. Here, we illustrate the removal of $A\circ\mkern-6.5mu-\mkern-6.5mu\circ D$. In total, the IOD visits 73 graphs, including graphs based on (b), (c),$\ldots$, (k), and it outputs the eight graphs in Figure 3.
  • Figure 3: Left: A MAG $\mathcal{M}$ with tiered ordering $\tau$, where the variables are measured in two datasets, dataset 1 and dataset 2. Right: All graphs visited by the tIOD algorithm (Algorithm \ref{['alg:tiod']}). Black edges are obtained up to line 50, blue edges are obtained from orientation rules R1-R4 and R8-R10. Crossed out graphs do not satisfy the criteria of line 65; all other graphs are output by the tIOD algorithm. In this example, the IOD algorithm visits more graphs but still outputs the same PAGs as the simple tIOD.
  • Figure 4: (a) MAG with tiered ordering of the nodes, with measurements in dataset 1 and dataset 2. (b) The intermediate graph $\mathcal{G}$ obtained at line 32 of the tIOD algorithm (Algorithm \ref{['alg:tiod']}) with oracle knowledge of (a) as input. (c) An example graph output by the IOD algorithm, with (a) as input), that would not have been output by the simple tIOD.
  • Figure 5: Orientation rules for the FCI algorithm spirtes1999fcizhang2008completeness. Here $\blacksquare$ is a placeholder edge mark equivalent to $*$.
  • ...and 1 more figures

Theorems & Definitions (40)

  • remark 1
  • definition 1: Tiered ordering
  • definition 2: Tiered background knowledge
  • definition 3: Cross-tier edge
  • definition 4
  • proposition 1
  • proposition 2
  • proposition 3
  • proposition 4
  • proposition 5
  • ...and 30 more