s-ID: Causal Effect Identification in a Sub-Population

Amir Mohammad Abouei; Ehsan Mokhtarian; Negar Kiyavash

s-ID: Causal Effect Identification in a Sub-Population

Amir Mohammad Abouei, Ehsan Mokhtarian, Negar Kiyavash

TL;DR

This work defines and addresses identifiability of causal effects within a sub-population by introducing the s-ID problem, which seeks $P^{\textsc{s}}_{\mathbf{X}}(\mathbf{Y})$ from the sub-population observational distribution $P^{\textsc{s}}(\mathbf{V}) = P(\mathbf{V} \vert S=1)$. It formalizes the augmented graph with an auxiliary binary variable $S$ to model sub-populations and derives necessary and sufficient graph conditions for both singleton and multivariate cases of $(\mathbf{X}, \mathbf{Y})$, proving identifiability criteria and impossibility results. A sound and complete algorithm, SID, is proposed to compute $P^{\textsc{s}}_{\mathbf{X}}(\mathbf{Y})$ from $P^{\textsc{s}}(\mathbf{V})$ when identifiable, with complexity $O(n+m)$. The paper positions s-ID as a practical alternative to existing ID/gID and c-ID/c-gID frameworks, highlighting the risk of erroneous inference when sub-population subtleties are ignored, and outlines future work on latent-variable extensions and full estimation pipelines. The results provide a principled, graph-based pathway to targeted causal inference within sub-populations, enabling more accurate policy evaluation and intervention planning in biased samples.

Abstract

Causal inference in a sub-population involves identifying the causal effect of an intervention on a specific subgroup, which is distinguished from the whole population through the influence of systematic biases in the sampling process. However, ignoring the subtleties introduced by sub-populations can either lead to erroneous inference or limit the applicability of existing methods. We introduce and advocate for a causal inference problem in sub-populations (henceforth called s-ID), in which we merely have access to observational data of the targeted sub-population (as opposed to the entire population). Existing inference problems in sub-populations operate on the premise that the given data distributions originate from the entire population, thus, cannot tackle the s-ID problem. To address this gap, we provide necessary and sufficient conditions that must hold in the causal graph for a causal effect in a sub-population to be identifiable from the observational distribution of that sub-population. Given these conditions, we present a sound and complete algorithm for the s-ID problem.

s-ID: Causal Effect Identification in a Sub-Population

TL;DR

This work defines and addresses identifiability of causal effects within a sub-population by introducing the s-ID problem, which seeks

from the sub-population observational distribution

. It formalizes the augmented graph with an auxiliary binary variable

to model sub-populations and derives necessary and sufficient graph conditions for both singleton and multivariate cases of

, proving identifiability criteria and impossibility results. A sound and complete algorithm, SID, is proposed to compute

from

when identifiable, with complexity

. The paper positions s-ID as a practical alternative to existing ID/gID and c-ID/c-gID frameworks, highlighting the risk of erroneous inference when sub-population subtleties are ignored, and outlines future work on latent-variable extensions and full estimation pipelines. The results provide a principled, graph-based pathway to targeted causal inference within sub-populations, enabling more accurate policy evaluation and intervention planning in biased samples.

Abstract

Paper Structure (28 sections, 18 theorems, 93 equations, 15 figures, 1 table, 1 algorithm)

This paper contains 28 sections, 18 theorems, 93 equations, 15 figures, 1 table, 1 algorithm.

Introduction
Preliminaries
The s-ID Problem
Modeling a Sub-Population: Auxiliary Variable $S$
Problem Formulation: Definition of s-ID
Conditions for s-Identifiability: Singleton Case
Conditions for s-Identifiability: Multivariate Case
A Sound And Complete Algorithm For s-ID
Related Work
Causal Inference in Population
Causal Inference in a Sub-Population
Causal Graph Variations
Conclusion and Future Work
A Additional Example in the Domain of Finance
B Preliminary Lemmas
...and 13 more sections

Key Result

Theorem 1

For two variables $X$ and $Y$, conditional causal effect $P^{\textsc{s}}_{X}(Y)$ is s-ID in DAG $\mathcal{G}^{\textsc{s}}$ if and only if

Figures (15)

Figure 1: $X$: whether the public health policy bans smoking in public areas. $Y$: rate of lung cancer. $Z$: percentage of people who smoke. $W$: the average age of people. In the left causal graph, $P_X(Y \vert S=1)$ is s-ID, i.e., can be computed from $P (X,Y,Z,W \vert S = 1)$, while it is not s-ID in the right causal graph.
Figure 2: Two types of DAGs used in the proof of Theorem \ref{['th:markov-single']}. The dotted edges indicate the presence of a directed path.
Figure 3: Three DAGs in which $P^{\textsc{s}}_{X}(Y)$ is s-ID.
Figure 4: Two DAGs in which $P^{\textsc{s}}_{X}(Y)$ is not s-ID.
Figure 5: An example for the multivariate case where conditional causal effect $P^{\textsc{s}}_{\{X_1, X_2\}}(Y)$ is not s-ID in the left graph while it is s-ID in the right graph and is equal to $\sum\limits_{Z, W} ^{ } P^{\textsc{s}}(Z, W \vert X_1) P^{\textsc{s}}(Y \vert X_2, Z, W)$.
...and 10 more figures

Theorems & Definitions (40)

Definition 1: s-ID
Theorem 1
proof : Sketch of proof
Claim 1
Claim 2
Claim 3
Remark 1
Proposition 1
Corollary 1
Theorem 2
...and 30 more

s-ID: Causal Effect Identification in a Sub-Population

TL;DR

Abstract

s-ID: Causal Effect Identification in a Sub-Population

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (40)