Table of Contents
Fetching ...

Simpson's Paradox with Any Given Number of Factors

Guisheng Dai, Weizhen Wang

TL;DR

The paper addresses the extension of Simpson's Paradox to an arbitrary number $n$ of factors, defining the $n$-factor Simpson's Paradox and proving the existence of probability distributions where the inferred effect of $A$ on $X$ alternates as each additional factor $B_i$ is observed. It introduces a geometric, inductive construction that decomposes probability vectors into positive components and uses angle arguments to realize the desired inequalities, yielding $2^n-1$ paradoxes. A detailed $n=3$ example demonstrates three successive reversals with explicit numerical decompositions and accompanying materials (R-code) for replication. The work highlights the non-monotonicity of statistical inference in the presence of multiple confounders and discusses extensions to finite-level factors and observational study implications.

Abstract

Simpson's Paradox is a well-known phenomenon in statistical science, where the relationship between the response variable $X$ and a certain explanatory factor of interest $A$ reverses when an additional factor $B_1$ is considered. This paper explores the extension of Simpson's Paradox to any given number $n$ of factors, referred to as the $n$-factor Simpson's Paradox. We first provide a rigorous definition of the $n$-factor Simpson's Paradox, then demonstrate the existence of a probability distribution through a geometric construction. Specifically, we show that for any positive integer $n$, it is possible to construct a probability distribution in which the conclusion about the effect of $A$ on $X$ reverses each time an additional factor $B_i$ is introduced for $i=1,...,n$. A detailed example for $n = 3$ illustrates the construction. Our results highlight that, contrary to the intuition that more data leads to more accurate inferences, the inclusion of additional factors can repeatedly reverse conclusions, emphasizing the complexity of statistical inference in the presence of multiple confounding variables.

Simpson's Paradox with Any Given Number of Factors

TL;DR

The paper addresses the extension of Simpson's Paradox to an arbitrary number of factors, defining the -factor Simpson's Paradox and proving the existence of probability distributions where the inferred effect of on alternates as each additional factor is observed. It introduces a geometric, inductive construction that decomposes probability vectors into positive components and uses angle arguments to realize the desired inequalities, yielding paradoxes. A detailed example demonstrates three successive reversals with explicit numerical decompositions and accompanying materials (R-code) for replication. The work highlights the non-monotonicity of statistical inference in the presence of multiple confounders and discusses extensions to finite-level factors and observational study implications.

Abstract

Simpson's Paradox is a well-known phenomenon in statistical science, where the relationship between the response variable and a certain explanatory factor of interest reverses when an additional factor is considered. This paper explores the extension of Simpson's Paradox to any given number of factors, referred to as the -factor Simpson's Paradox. We first provide a rigorous definition of the -factor Simpson's Paradox, then demonstrate the existence of a probability distribution through a geometric construction. Specifically, we show that for any positive integer , it is possible to construct a probability distribution in which the conclusion about the effect of on reverses each time an additional factor is introduced for . A detailed example for illustrates the construction. Our results highlight that, contrary to the intuition that more data leads to more accurate inferences, the inclusion of additional factors can repeatedly reverse conclusions, emphasizing the complexity of statistical inference in the presence of multiple confounding variables.

Paper Structure

This paper contains 4 sections, 4 theorems, 45 equations, 1 figure, 1 table.

Key Result

Lemma 1

For any four positive constants $x_1,x_2,y_1,$ and $y_2$, let $\theta_x$ and $\theta_y$ be the angles of vectors $(x_2,x_1)$ and $(y_2,y_1)$ in the $R^2$ plane, respectively. Then the following three are equivalent: i) ${x_1\over x_1+x_2}>{y_1\over y_1+y_2}$; ii) $\tan(\theta_x)>\tan(\theta_y)$; iii

Figures (1)

  • Figure 1: The construction of the Simpson's Paradox. The top six graphs illustrate Lemma \ref{['lem-simpson-1']} and the bottom six are for Lemma \ref{['lem-simpson-1-1']}.

Theorems & Definitions (5)

  • Definition 1
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Proposition 1