Sample Observed Effects: Enumeration, Randomization and Generalization

Andre F. Ribeiro

Sample Observed Effects: Enumeration, Randomization and Generalization

Andre F. Ribeiro

TL;DR

This paper reframes external validity of intervention effects through a combinatorial lens that emphasizes effect observations across diverse backgrounds rather than a single counterfactual. It defines EV-increasing and background-randomization concepts, introduces square-based background enumeration, and develops non-parametric tests and gap statistics to distinguish causes from confounders under unobserved variation. The approach yields theoretical and empirical insights into generalizability, confounding separability, and sample-size requirements, and demonstrates practical relevance with simulations and COVID-19 data analyses. By linking combinatorial background variation to predictive and causal inference, it provides a framework that unifies causality and machine learning perspectives and offers actionable guidance for external validity and interpretability in non-i.i.d. settings.

Abstract

The widely used 'Counterfactual' definition of Causal Effects was derived for unbiasedness and accuracy - and not generalizability. We propose a Combinatorial definition for the External Validity (EV) of intervention effects. We first define the concept of an effect observation 'background'. We then formulate conditions for effect generalization based on samples' sets of (observed and unobserved) backgrounds. This reveals two limits for effect generalization: (1) when effects of a variable are observed under all their enumerable backgrounds, or, (2) when backgrounds have become sufficiently randomized. We use the resulting combinatorial framework to re-examine several issues in the original counterfactual formulation: out-of-sample validity, concurrent estimation of multiple effects, bias-variance tradeoffs, statistical power, and connections to current predictive and explaining techniques. Methodologically, the definitions also allow us to replace the parametric estimation problems that followed the counterfactual definition by combinatorial enumeration and randomization problems in non-experimental samples. We use the resulting non-parametric framework to demonstrate (External Validity, Unconfoundness and Precision) tradeoffs in the performance of popular supervised, explaining, and causal-effect estimators. We also illustrate how the approach allows for the use of supervised and explaining methods in non-i.i.d. samples. The COVID19 pandemic highlighted the need for learning solutions to provide predictions in severally incomplete samples. We demonstrate applications in this pressing problem.

Sample Observed Effects: Enumeration, Randomization and Generalization

TL;DR

Abstract

Sample Observed Effects: Enumeration, Randomization and Generalization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (12)