General sample size analysis for probabilities of causation: a delta method approach

Tianyuan Cheng; Ruirui Mao; Judea Pearl; Ang Li

General sample size analysis for probabilities of causation: a delta method approach

Tianyuan Cheng, Ruirui Mao, Judea Pearl, Ang Li

Abstract

Probabilities of causation (PoCs), such as the probability of necessity and sufficiency (PNS), are important tools for decision making but are generally not point identifiable. Existing work has derived bounds for these quantities using combinations of experimental and observational data. However, there is very limited research on sample size analysis, namely, how many experimental and observational samples are required to achieve a desired margin of error. In this paper, we propose a general sample size framework based on the delta method. Our approach applies to settings in which the target bounds of PoCs can be expressed as finite minima or maxima of linear combinations of experimental and observational probabilities. Through simulation studies, we demonstrate that the proposed sample size calculations lead to stable estimation of these bounds.

General sample size analysis for probabilities of causation: a delta method approach

Abstract

Paper Structure (12 sections, 5 theorems, 69 equations, 5 figures)

This paper contains 12 sections, 5 theorems, 69 equations, 5 figures.

Introduction
Preliminaries
Main Results
Simulation
Experiment: estimating PNS bound
Discussion
Conclusion
Appendix
Calculation of sample size in Experiment 1, PNS bound
Coefficients of Experiments
Model 1
Model 2

Key Result

Lemma 1

Consider a structural causal model with finite discrete variables, and focus on the case that $X$ and $Y$ are both binary, i.e., $X$ has two treatment levels $x, x'$ and $Y$ has two outcome levels $y, y'$. Let $\theta\in\mathbb{R}^d$ collect the observational and experimental probabilities used to where the denominator $h_Q(\theta)$ is defined as: $U_Q(\theta)$ and $L_Q(\theta)$ are continuous

Figures (5)

Figure 1: The Causal model for PNS bound experiment. X and Y are binary, and Z is a set of 20 independent binary confounders.
Figure 2: Scatter plots under different sample sizes for Model 1. Left to right: $n=120, 481, 1921$.
Figure 3: Scatter plots under different sample sizes for Model 2. Top to bottom: $n=120, 481, 1921$.
Figure 4: Average error of estimations VS sample size. Left: Model 1, Right:Model 2
Figure 5: Average error with sample size for 20 replicates

Theorems & Definitions (14)

Definition 1: Probability of Necessity (PN)
Definition 2: Probability of Sufficiency (PS)
Definition 3: Probability of Necessity and Sufficiency (PNS)
Lemma 1: Piecewise-linear and fractional forms of sharp bounds for PoCs
proof
Theorem 1
proof
Corollary 1
proof
Theorem 2
...and 4 more

General sample size analysis for probabilities of causation: a delta method approach

Abstract

General sample size analysis for probabilities of causation: a delta method approach

Authors

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (14)