Table of Contents
Fetching ...

General sample size analysis for probabilities of causation: a delta method approach

Tianyuan Cheng, Ruirui Mao, Judea Pearl, Ang Li

Abstract

Probabilities of causation (PoCs), such as the probability of necessity and sufficiency (PNS), are important tools for decision making but are generally not point identifiable. Existing work has derived bounds for these quantities using combinations of experimental and observational data. However, there is very limited research on sample size analysis, namely, how many experimental and observational samples are required to achieve a desired margin of error. In this paper, we propose a general sample size framework based on the delta method. Our approach applies to settings in which the target bounds of PoCs can be expressed as finite minima or maxima of linear combinations of experimental and observational probabilities. Through simulation studies, we demonstrate that the proposed sample size calculations lead to stable estimation of these bounds.

General sample size analysis for probabilities of causation: a delta method approach

Abstract

Probabilities of causation (PoCs), such as the probability of necessity and sufficiency (PNS), are important tools for decision making but are generally not point identifiable. Existing work has derived bounds for these quantities using combinations of experimental and observational data. However, there is very limited research on sample size analysis, namely, how many experimental and observational samples are required to achieve a desired margin of error. In this paper, we propose a general sample size framework based on the delta method. Our approach applies to settings in which the target bounds of PoCs can be expressed as finite minima or maxima of linear combinations of experimental and observational probabilities. Through simulation studies, we demonstrate that the proposed sample size calculations lead to stable estimation of these bounds.
Paper Structure (12 sections, 5 theorems, 69 equations, 5 figures)

This paper contains 12 sections, 5 theorems, 69 equations, 5 figures.

Key Result

Lemma 1

Consider a structural causal model with finite discrete variables, and focus on the case that $X$ and $Y$ are both binary, i.e., $X$ has two treatment levels $x, x'$ and $Y$ has two outcome levels $y, y'$. Let $\theta\in\mathbb{R}^d$ collect the observational and experimental probabilities used to where the denominator $h_Q(\theta)$ is defined as: $U_Q(\theta)$ and $L_Q(\theta)$ are continuous

Figures (5)

  • Figure 1: The Causal model for PNS bound experiment. X and Y are binary, and Z is a set of 20 independent binary confounders.
  • Figure 2: Scatter plots under different sample sizes for Model 1. Left to right: $n=120, 481, 1921$.
  • Figure 3: Scatter plots under different sample sizes for Model 2. Top to bottom: $n=120, 481, 1921$.
  • Figure 4: Average error of estimations VS sample size. Left: Model 1, Right:Model 2
  • Figure 5: Average error with sample size for 20 replicates

Theorems & Definitions (14)

  • Definition 1: Probability of Necessity (PN)
  • Definition 2: Probability of Sufficiency (PS)
  • Definition 3: Probability of Necessity and Sufficiency (PNS)
  • Lemma 1: Piecewise-linear and fractional forms of sharp bounds for PoCs
  • proof
  • Theorem 1
  • proof
  • Corollary 1
  • proof
  • Theorem 2
  • ...and 4 more