Table of Contents
Fetching ...

The Categorical Instrumental Variable Model: Characterization, Partial Identification, and Statistical Inference

Yilin Song, F. Richard Guo, K. C. Gary Chan, Thomas S. Richardson

TL;DR

This work analyzes partial identification for categorical instrumental variable models where $Z$, $X$, and $Y$ are finite-valued. It derives a simple closed-form characterization of the joint counterfactual distribution $P'(Y(x_1),\dots,Y(x_K))$ via linear inequalities that link counterfactuals to the observed $P(X,Y\mid Z)$, and shows these are necessary, sufficient, and non-redundant across five IV models defined by exclusion and independence variants. The sufficiency proof uses Strassen's theorem to construct couplings, yielding a polyhedral description that supports sharp bounds on linear functionals such as the average treatment effect and enables a falsification test. For inference, the paper develops a conservative finite-sample CI framework based on a KL-divergence tail bound, implemented through convex programming, and demonstrates practical utility with the Minneapolis Domestic Violence Experiment data, where multi-arm IV analysis is essential.

Abstract

We study categorical instrumental variable (IV) models with instrument, treatment, and outcome taking finitely many values. We derive a simple closed-form characterization of the set of joint distributions of potential outcomes that are compatible with a given observed data distribution in terms of a set of inequalities. These inequalities unify several different IV models defined by versions of the independence and exclusion restriction assumptions and are shown to be non-redundant. Finally, given a set of linear functionals of the joint counterfactual distribution, such as pairwise average treatment effects, we construct confidence intervals with simultaneous finite-sample coverage, using a tail bound on the Kullback--Leibler divergence. We illustrate our method using data from the Minneapolis Domestic Violence Experiment.

The Categorical Instrumental Variable Model: Characterization, Partial Identification, and Statistical Inference

TL;DR

This work analyzes partial identification for categorical instrumental variable models where , , and are finite-valued. It derives a simple closed-form characterization of the joint counterfactual distribution via linear inequalities that link counterfactuals to the observed , and shows these are necessary, sufficient, and non-redundant across five IV models defined by exclusion and independence variants. The sufficiency proof uses Strassen's theorem to construct couplings, yielding a polyhedral description that supports sharp bounds on linear functionals such as the average treatment effect and enables a falsification test. For inference, the paper develops a conservative finite-sample CI framework based on a KL-divergence tail bound, implemented through convex programming, and demonstrates practical utility with the Minneapolis Domestic Violence Experiment data, where multi-arm IV analysis is essential.

Abstract

We study categorical instrumental variable (IV) models with instrument, treatment, and outcome taking finitely many values. We derive a simple closed-form characterization of the set of joint distributions of potential outcomes that are compatible with a given observed data distribution in terms of a set of inequalities. These inequalities unify several different IV models defined by versions of the independence and exclusion restriction assumptions and are shown to be non-redundant. Finally, given a set of linear functionals of the joint counterfactual distribution, such as pairwise average treatment effects, we construct confidence intervals with simultaneous finite-sample coverage, using a tail bound on the Kullback--Leibler divergence. We illustrate our method using data from the Minneapolis Domestic Violence Experiment.
Paper Structure (26 sections, 17 theorems, 61 equations, 6 figures, 6 tables, 1 algorithm)

This paper contains 26 sections, 17 theorems, 61 equations, 6 figures, 6 tables, 1 algorithm.

Key Result

Lemma 2.1

We have ${\cal M}_1 \subset {\cal M}_2 \subset {\cal M}_3$, ${\cal M}_1 \subset {\cal M}_4$ and ${\cal M}_2 \subset {\cal M}_5$.

Figures (6)

  • Figure 1.1: Directed acyclic graph (DAG) representing the assumptions of a valid instrumental variable, where the dashed edges are assumed to be absent.
  • Figure 2.1: Nested structure between models ${{\cal M}}_1$--${{\cal M}}_5$.
  • Figure 2.2: Graphical representations of independence and exclusion assumptions discussed in Section \ref{['sec:assumptions']}. ${\cal M}_1$ and ${\cal M}_4$ do not have confounding between $Z$ and $X$ and independence is encoded using the extension of d-separation to acyclic graphs with bi-directed ($\leftrightarrow$) edges richardson-admg:2003; ${\cal M}_2$, ${\cal M}_3$ and ${\cal M}_5$ allow confounding between $Z$ and $X$ and their independence assumptions follow from Pearl's d-separation for directed acyclic graphs. (Note that (e) encodes a slightly stronger version of (A\ref{['assumption:indep']}-4) with $X(z)$ replacing $X$.) In (a) and (b), when a variable is connected to its parents with double edges ($\Rightarrow$), the variable is a deterministic function of its parents. The individual exclusion assumption (A\ref{['assumption:exclusion']}-1) in ${\cal M}_1$ and ${\cal M}_2$ follows because $Y$ is determined by $\{Y(x,z)\}$ and $X$ (and not $Z$); The joint stochastic exclusion assumption (A\ref{['assumption:exclusion']}-2) in ${\cal M}_3$ cannot be (easily) represented graphically and is stated explicitly; individual exclusion in $M_4$ is implied because the SWIG contains $Y(x)$ rather than $Y(x,z)$; the latent stochastic exclusion assumption in ${\cal M}_5$ is signified by the absence of an edge from $z$ to $Y(x,z)$; indeed, it holds that $z$ is d-separated from $Y(x,z)$ given $U$malinsky19bRichardsonRobins-2023.
  • Figure 5.1: Illustration of pairs $(\bm{a},\bm{b}) \in {\cal A} \times {\cal B}$ when $K=M=2$ under a fixed instrument arm $z$. Each edge corresponds to a coherent pair.
  • Figure B.1: For each $B$ (vertices in box), $\mathcal{R}_{C}(B)$ consists of both blue edges (between ${\cal N}_{\mathcal{R}_C}'(B)$ and $B$) and red edges (between $\overline{{\cal N}_{\mathcal{R}_C}'(B)}$ and $\overline{B}$). By \ref{['lemma:reverse-ineq']}, there is a 1-1 correspondence between $\cal V\in \cal A$ and $B\subseteq\cal B$, and we have ${\cal V}=\overline{{\cal N}_{\mathcal{R}_C}'(B)}$. Note that the edges in (c) are contained by those in (a).
  • ...and 1 more figures

Theorems & Definitions (44)

  • Lemma 2.1
  • proof
  • Theorem 3.1
  • Remark 3.1
  • Remark 3.2
  • Corollary 3.1
  • Theorem 3.2
  • Corollary 3.2
  • Remark 3.3
  • Example 1
  • ...and 34 more