Table of Contents
Fetching ...

Sample Observed Effects: Enumeration, Randomization and Generalization

Andre F. Ribeiro

TL;DR

This paper reframes external validity of intervention effects through a combinatorial lens that emphasizes effect observations across diverse backgrounds rather than a single counterfactual. It defines EV-increasing and background-randomization concepts, introduces square-based background enumeration, and develops non-parametric tests and gap statistics to distinguish causes from confounders under unobserved variation. The approach yields theoretical and empirical insights into generalizability, confounding separability, and sample-size requirements, and demonstrates practical relevance with simulations and COVID-19 data analyses. By linking combinatorial background variation to predictive and causal inference, it provides a framework that unifies causality and machine learning perspectives and offers actionable guidance for external validity and interpretability in non-i.i.d. settings.

Abstract

The widely used 'Counterfactual' definition of Causal Effects was derived for unbiasedness and accuracy - and not generalizability. We propose a Combinatorial definition for the External Validity (EV) of intervention effects. We first define the concept of an effect observation 'background'. We then formulate conditions for effect generalization based on samples' sets of (observed and unobserved) backgrounds. This reveals two limits for effect generalization: (1) when effects of a variable are observed under all their enumerable backgrounds, or, (2) when backgrounds have become sufficiently randomized. We use the resulting combinatorial framework to re-examine several issues in the original counterfactual formulation: out-of-sample validity, concurrent estimation of multiple effects, bias-variance tradeoffs, statistical power, and connections to current predictive and explaining techniques. Methodologically, the definitions also allow us to replace the parametric estimation problems that followed the counterfactual definition by combinatorial enumeration and randomization problems in non-experimental samples. We use the resulting non-parametric framework to demonstrate (External Validity, Unconfoundness and Precision) tradeoffs in the performance of popular supervised, explaining, and causal-effect estimators. We also illustrate how the approach allows for the use of supervised and explaining methods in non-i.i.d. samples. The COVID19 pandemic highlighted the need for learning solutions to provide predictions in severally incomplete samples. We demonstrate applications in this pressing problem.

Sample Observed Effects: Enumeration, Randomization and Generalization

TL;DR

This paper reframes external validity of intervention effects through a combinatorial lens that emphasizes effect observations across diverse backgrounds rather than a single counterfactual. It defines EV-increasing and background-randomization concepts, introduces square-based background enumeration, and develops non-parametric tests and gap statistics to distinguish causes from confounders under unobserved variation. The approach yields theoretical and empirical insights into generalizability, confounding separability, and sample-size requirements, and demonstrates practical relevance with simulations and COVID-19 data analyses. By linking combinatorial background variation to predictive and causal inference, it provides a framework that unifies causality and machine learning perspectives and offers actionable guidance for external validity and interpretability in non-i.i.d. settings.

Abstract

The widely used 'Counterfactual' definition of Causal Effects was derived for unbiasedness and accuracy - and not generalizability. We propose a Combinatorial definition for the External Validity (EV) of intervention effects. We first define the concept of an effect observation 'background'. We then formulate conditions for effect generalization based on samples' sets of (observed and unobserved) backgrounds. This reveals two limits for effect generalization: (1) when effects of a variable are observed under all their enumerable backgrounds, or, (2) when backgrounds have become sufficiently randomized. We use the resulting combinatorial framework to re-examine several issues in the original counterfactual formulation: out-of-sample validity, concurrent estimation of multiple effects, bias-variance tradeoffs, statistical power, and connections to current predictive and explaining techniques. Methodologically, the definitions also allow us to replace the parametric estimation problems that followed the counterfactual definition by combinatorial enumeration and randomization problems in non-experimental samples. We use the resulting non-parametric framework to demonstrate (External Validity, Unconfoundness and Precision) tradeoffs in the performance of popular supervised, explaining, and causal-effect estimators. We also illustrate how the approach allows for the use of supervised and explaining methods in non-i.i.d. samples. The COVID19 pandemic highlighted the need for learning solutions to provide predictions in severally incomplete samples. We demonstrate applications in this pressing problem.

Paper Structure

This paper contains 30 sections, 3 theorems, 25 equations, 8 figures.

Key Result

Proposition 1

Effect observations of factor $a$ at $\textrm{argmin}_k X_{(k)}=a$ are a sample's minimally confounded and maximally precise individual observations of $a$'s effect, $\textrm{Var}^{-1}[\Delta y(a)]$. In a single-row square, the effect of its first factor is observed at maximum precision. All other e

Figures (8)

  • Figure 1: (a) $4{\times}4$ Latin-Square ('square') as sets of effect observations for a sample unit $x_0 \subseteq X$, (b) $3$ samples (horizontal lines, top) with increasing dimension $m = \{1,2,4\}$ and their ordinal statistics (interval ticks) , as well as alternative samples of confounding unobserved factors $U$ (circles) (top), resulting distribution and percentage of enumerable permutations of effect observations across ordinal variables (bottom and right).
  • Figure 2: (a) sampling sequences in a sample, $X=\{a,b,c\}$, with confounder partition $U=\{\{u_1\}, \{u_2,u3\}, \{u_4\}\}$, (b) normalized variation separation distance (distance to a Uniform distribution) of all backgrounds $\pi_0$ to other backgrounds after time $t$, vertical dashed lines mark limits on successive background randomness based on their number of inversions, Eq.(\ref{['eq-inv']}) (top panel shows the same without a test of stationarity on gap distributions $p$), (c) randomization in graphical models with in-sample treatment $A$, other factors $W$, unobserved factors $U$, and outcome $Y$, (d) combinatorial relations in a square ($m=2$), circles are differences and gray circles are inter-unit factor value intersections (letters) between $x_0$ and other units.
  • Figure 3: (a) observed permutations and limits (m=10, balanced, log-scale); (b) square sample-unit histograms, increasing $n$.
  • Figure 4: supervised ACC vs. $n$ for the (a)unbalanced and (b)balanced cases; (c) ACC under different transversals; (d) a square transversal.
  • Figure 5: supervised ACC vs. $n$ for the (a)correlated, $\rho=\{0.1,0.25,0.5\}$, and (b)counterfactual prediction cases.
  • ...and 3 more figures

Theorems & Definitions (12)

  • Example 1: $U{-}X$ Linear Extensions
  • Example 2: $U{-}X$ Data-Generating Order Relations
  • Example 3: Iterative higher-order effects
  • Proposition 1: Sample Ordinal Statistics and Effect Variance
  • Proposition 2: Full-Observability and Effect Variance
  • Definition 1: Sampling Sequence
  • Definition 2: EV-increasing sequence
  • Definition 3: CF non-increasing sequence
  • Definition 4: EV-CF sequences
  • Definition 5: $U{-}X$ Effect Background Enumeration
  • ...and 2 more