Table of Contents
Fetching ...

s-ID: Causal Effect Identification in a Sub-Population

Amir Mohammad Abouei, Ehsan Mokhtarian, Negar Kiyavash

TL;DR

This work defines and addresses identifiability of causal effects within a sub-population by introducing the s-ID problem, which seeks $P^{\textsc{s}}_{\mathbf{X}}(\mathbf{Y})$ from the sub-population observational distribution $P^{\textsc{s}}(\mathbf{V}) = P(\mathbf{V} \vert S=1)$. It formalizes the augmented graph with an auxiliary binary variable $S$ to model sub-populations and derives necessary and sufficient graph conditions for both singleton and multivariate cases of $(\mathbf{X}, \mathbf{Y})$, proving identifiability criteria and impossibility results. A sound and complete algorithm, SID, is proposed to compute $P^{\textsc{s}}_{\mathbf{X}}(\mathbf{Y})$ from $P^{\textsc{s}}(\mathbf{V})$ when identifiable, with complexity $O(n+m)$. The paper positions s-ID as a practical alternative to existing ID/gID and c-ID/c-gID frameworks, highlighting the risk of erroneous inference when sub-population subtleties are ignored, and outlines future work on latent-variable extensions and full estimation pipelines. The results provide a principled, graph-based pathway to targeted causal inference within sub-populations, enabling more accurate policy evaluation and intervention planning in biased samples.

Abstract

Causal inference in a sub-population involves identifying the causal effect of an intervention on a specific subgroup, which is distinguished from the whole population through the influence of systematic biases in the sampling process. However, ignoring the subtleties introduced by sub-populations can either lead to erroneous inference or limit the applicability of existing methods. We introduce and advocate for a causal inference problem in sub-populations (henceforth called s-ID), in which we merely have access to observational data of the targeted sub-population (as opposed to the entire population). Existing inference problems in sub-populations operate on the premise that the given data distributions originate from the entire population, thus, cannot tackle the s-ID problem. To address this gap, we provide necessary and sufficient conditions that must hold in the causal graph for a causal effect in a sub-population to be identifiable from the observational distribution of that sub-population. Given these conditions, we present a sound and complete algorithm for the s-ID problem.

s-ID: Causal Effect Identification in a Sub-Population

TL;DR

This work defines and addresses identifiability of causal effects within a sub-population by introducing the s-ID problem, which seeks from the sub-population observational distribution . It formalizes the augmented graph with an auxiliary binary variable to model sub-populations and derives necessary and sufficient graph conditions for both singleton and multivariate cases of , proving identifiability criteria and impossibility results. A sound and complete algorithm, SID, is proposed to compute from when identifiable, with complexity . The paper positions s-ID as a practical alternative to existing ID/gID and c-ID/c-gID frameworks, highlighting the risk of erroneous inference when sub-population subtleties are ignored, and outlines future work on latent-variable extensions and full estimation pipelines. The results provide a principled, graph-based pathway to targeted causal inference within sub-populations, enabling more accurate policy evaluation and intervention planning in biased samples.

Abstract

Causal inference in a sub-population involves identifying the causal effect of an intervention on a specific subgroup, which is distinguished from the whole population through the influence of systematic biases in the sampling process. However, ignoring the subtleties introduced by sub-populations can either lead to erroneous inference or limit the applicability of existing methods. We introduce and advocate for a causal inference problem in sub-populations (henceforth called s-ID), in which we merely have access to observational data of the targeted sub-population (as opposed to the entire population). Existing inference problems in sub-populations operate on the premise that the given data distributions originate from the entire population, thus, cannot tackle the s-ID problem. To address this gap, we provide necessary and sufficient conditions that must hold in the causal graph for a causal effect in a sub-population to be identifiable from the observational distribution of that sub-population. Given these conditions, we present a sound and complete algorithm for the s-ID problem.
Paper Structure (28 sections, 18 theorems, 93 equations, 15 figures, 1 table, 1 algorithm)

This paper contains 28 sections, 18 theorems, 93 equations, 15 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

For two variables $X$ and $Y$, conditional causal effect $P^{\textsc{s}}_{X}(Y)$ is s-ID in DAG $\mathcal{G}^{\textsc{s}}$ if and only if

Figures (15)

  • Figure 1: $X$: whether the public health policy bans smoking in public areas. $Y$: rate of lung cancer. $Z$: percentage of people who smoke. $W$: the average age of people. In the left causal graph, $P_X(Y \vert S=1)$ is s-ID, i.e., can be computed from $P (X,Y,Z,W \vert S = 1)$, while it is not s-ID in the right causal graph.
  • Figure 2: Two types of DAGs used in the proof of Theorem \ref{['th:markov-single']}. The dotted edges indicate the presence of a directed path.
  • Figure 3: Three DAGs in which $P^{\textsc{s}}_{X}(Y)$ is s-ID.
  • Figure 4: Two DAGs in which $P^{\textsc{s}}_{X}(Y)$ is not s-ID.
  • Figure 5: An example for the multivariate case where conditional causal effect $P^{\textsc{s}}_{\{X_1, X_2\}}(Y)$ is not s-ID in the left graph while it is s-ID in the right graph and is equal to $\sum\limits_{Z, W} ^{ } P^{\textsc{s}}(Z, W \vert X_1) P^{\textsc{s}}(Y \vert X_2, Z, W)$.
  • ...and 10 more figures

Theorems & Definitions (40)

  • Definition 1: s-ID
  • Theorem 1
  • proof : Sketch of proof
  • Claim 1
  • Claim 2
  • Claim 3
  • Remark 1
  • Proposition 1
  • Corollary 1
  • Theorem 2
  • ...and 30 more