s-ID: Causal Effect Identification in a Sub-Population
Amir Mohammad Abouei, Ehsan Mokhtarian, Negar Kiyavash
TL;DR
This work defines and addresses identifiability of causal effects within a sub-population by introducing the s-ID problem, which seeks $P^{\textsc{s}}_{\mathbf{X}}(\mathbf{Y})$ from the sub-population observational distribution $P^{\textsc{s}}(\mathbf{V}) = P(\mathbf{V} \vert S=1)$. It formalizes the augmented graph with an auxiliary binary variable $S$ to model sub-populations and derives necessary and sufficient graph conditions for both singleton and multivariate cases of $(\mathbf{X}, \mathbf{Y})$, proving identifiability criteria and impossibility results. A sound and complete algorithm, SID, is proposed to compute $P^{\textsc{s}}_{\mathbf{X}}(\mathbf{Y})$ from $P^{\textsc{s}}(\mathbf{V})$ when identifiable, with complexity $O(n+m)$. The paper positions s-ID as a practical alternative to existing ID/gID and c-ID/c-gID frameworks, highlighting the risk of erroneous inference when sub-population subtleties are ignored, and outlines future work on latent-variable extensions and full estimation pipelines. The results provide a principled, graph-based pathway to targeted causal inference within sub-populations, enabling more accurate policy evaluation and intervention planning in biased samples.
Abstract
Causal inference in a sub-population involves identifying the causal effect of an intervention on a specific subgroup, which is distinguished from the whole population through the influence of systematic biases in the sampling process. However, ignoring the subtleties introduced by sub-populations can either lead to erroneous inference or limit the applicability of existing methods. We introduce and advocate for a causal inference problem in sub-populations (henceforth called s-ID), in which we merely have access to observational data of the targeted sub-population (as opposed to the entire population). Existing inference problems in sub-populations operate on the premise that the given data distributions originate from the entire population, thus, cannot tackle the s-ID problem. To address this gap, we provide necessary and sufficient conditions that must hold in the causal graph for a causal effect in a sub-population to be identifiable from the observational distribution of that sub-population. Given these conditions, we present a sound and complete algorithm for the s-ID problem.
