Simpson's Paradox with Any Given Number of Factors
Guisheng Dai, Weizhen Wang
TL;DR
The paper addresses the extension of Simpson's Paradox to an arbitrary number $n$ of factors, defining the $n$-factor Simpson's Paradox and proving the existence of probability distributions where the inferred effect of $A$ on $X$ alternates as each additional factor $B_i$ is observed. It introduces a geometric, inductive construction that decomposes probability vectors into positive components and uses angle arguments to realize the desired inequalities, yielding $2^n-1$ paradoxes. A detailed $n=3$ example demonstrates three successive reversals with explicit numerical decompositions and accompanying materials (R-code) for replication. The work highlights the non-monotonicity of statistical inference in the presence of multiple confounders and discusses extensions to finite-level factors and observational study implications.
Abstract
Simpson's Paradox is a well-known phenomenon in statistical science, where the relationship between the response variable $X$ and a certain explanatory factor of interest $A$ reverses when an additional factor $B_1$ is considered. This paper explores the extension of Simpson's Paradox to any given number $n$ of factors, referred to as the $n$-factor Simpson's Paradox. We first provide a rigorous definition of the $n$-factor Simpson's Paradox, then demonstrate the existence of a probability distribution through a geometric construction. Specifically, we show that for any positive integer $n$, it is possible to construct a probability distribution in which the conclusion about the effect of $A$ on $X$ reverses each time an additional factor $B_i$ is introduced for $i=1,...,n$. A detailed example for $n = 3$ illustrates the construction. Our results highlight that, contrary to the intuition that more data leads to more accurate inferences, the inclusion of additional factors can repeatedly reverse conclusions, emphasizing the complexity of statistical inference in the presence of multiple confounding variables.
