Table of Contents
Fetching ...

Detecting Weak Distribution Shifts via Displacement Interpolation

YoonHaeng Hur, Tengyuan Liang

Abstract

Detecting weak, systematic distribution shifts and quantitatively modeling individual, heterogeneous responses to policies or incentives have found increasing empirical applications in social and economic sciences. Given two probability distributions $P$ (null) and $Q$ (alternative), we study the problem of detecting weak distribution shift deviating from the null $P$ toward the alternative $Q$, where the level of deviation vanishes as a function of $n$, the sample size. We propose a model for weak distribution shifts via displacement interpolation between $P$ and $Q$, drawing from the optimal transport theory. We study a hypothesis testing procedure based on the Wasserstein distance, derive sharp conditions under which detection is possible, and provide the exact characterization of the asymptotic Type I and Type II errors at the detection boundary using empirical processes. We demonstrate how the proposed testing procedure works in modeling and detecting weak distribution shifts in real data sets using two empirical examples: distribution shifts in consumer spending after COVID-19, and heterogeneity in the published p-values of statistical tests in journals across different disciplines.

Detecting Weak Distribution Shifts via Displacement Interpolation

Abstract

Detecting weak, systematic distribution shifts and quantitatively modeling individual, heterogeneous responses to policies or incentives have found increasing empirical applications in social and economic sciences. Given two probability distributions (null) and (alternative), we study the problem of detecting weak distribution shift deviating from the null toward the alternative , where the level of deviation vanishes as a function of , the sample size. We propose a model for weak distribution shifts via displacement interpolation between and , drawing from the optimal transport theory. We study a hypothesis testing procedure based on the Wasserstein distance, derive sharp conditions under which detection is possible, and provide the exact characterization of the asymptotic Type I and Type II errors at the detection boundary using empirical processes. We demonstrate how the proposed testing procedure works in modeling and detecting weak distribution shifts in real data sets using two empirical examples: distribution shifts in consumer spending after COVID-19, and heterogeneity in the published p-values of statistical tests in journals across different disciplines.
Paper Structure (31 sections, 6 theorems, 50 equations, 6 figures, 1 table)

This paper contains 31 sections, 6 theorems, 50 equations, 6 figures, 1 table.

Key Result

Theorem 1

Suppose $F$ has a density $f$ that is continuous and bounded away from $0$ on some compact interval $I_F$ and is $0$ on $\mathbb{R} \backslash I_F$, $G^{-1}$ is bounded on $(0, 1)$, and $G^{-1} \circ F$ is Lipschitz. Let $\omega$ be a finite Borel measure on $(0, 1)$ that is absolutely continuous wi where $(\mathbf{B}_u)_{u \in [0, 1]}$ is the standard Brownian bridge. Fix $\alpha \in (0, 1)$ and

Figures (6)

  • Figure 1: Illustration of two interpolation schemes. Linear interpolation $(1 - \epsilon) P + \epsilon Q$ vertically combines the cumulative distribution functions $F$ and $G$; namely, its cumulative distribution function (blue, dashed) is $(1 - \epsilon) F + \epsilon G$. Meanwhile, displacement interpolation (red, solid) horizontally combines $F$ and $G$, or equivalently, vertically combines the quantile functions $F^{-1}$ and $G^{-1}$. In other words, the quantile function of $((1 - \epsilon) \mathrm{Id} + \epsilon G^{-1} \circ F)_\# P$ is $(1 - \epsilon) F^{-1} + \epsilon G^{-1}$.
  • Figure 2: (a) plots a time series of consumer spending from January 13, 2020 to June 5, 2022, where some noticeable events are marked: March 13, 2020 (national emergency declared) and April 15, 2020, January 4, 2021, and March 17, 2021 (first, second, and third stimulus payments start, respectively). (b) and (c) show the smoothed histograms of monthly average consumer spending by county from March 16, 2020 to March 15, 2021 and from March 16, 2021 to March 15, 2022, respectively.
  • Figure 3: (a) plots the relative Wasserstein-2 distances $\epsilon_t = \frac{W_2(P_0, P_t)}{W_2(P_0, P_1)}$ and the relative TV distances $\gamma_t = \frac{TV(P_0, P_t)}{TV(P_0, P_1)}$ for $t \in \{0, 1 / 11, \ldots, 1\}$, with the 45 degree line shown as a dashed line; relative Wasserstein-1 distances $\frac{W_1(P_0, P_t)}{W_1(P_0, P_1)}$ are plotted for reference as well, which almost coincide with $\epsilon_t$. (b) and (c) show displacement interpolation $Q_t^{\mathrm{dis}} = ((1 - \epsilon_t) \mathrm{Id} + \epsilon_t T)_\# P_0$ and linear interpolation $Q_t^{\mathrm{lin}} = (1 - \gamma_t) P_0 + \gamma_t P_1$, respectively.
  • Figure 4: (a) visualizes the sum of Type I and Type II errors of Experiment 1; the red dashed line represents the level $\alpha = 0.05$. (b) plots Type II errors of Experiment 2 as a color map, where the red solid curves represent $\gamma \cdot \Delta= \text{constant} \in \{0.2, 0.45, 0.7\}$
  • Figure 5: (a) and (b) show the powers of the proposed testing procedure and the KS test in Example I and Example II, respectively; solid lines correspond to the proposed testing procedure, whereas dotted lines represent the KS test. (c) shows the powers of the proposed testing procedures in the setting of Example II, using the weight measure given by \ref{['eq:quadratic_weight']}. By design, the solid lines of (b) and (c) coincide; both correspond to the Lebesgue measure, namely, $a = 0$ in \ref{['eq:quadratic_weight']}.
  • ...and 1 more figures

Theorems & Definitions (18)

  • Definition 1
  • Remark 1
  • Theorem 1
  • Remark 2
  • Lemma 1
  • Remark 3
  • Remark 4
  • Lemma 2
  • Proposition 1: Limit under the null
  • Theorem 2: Limit under the alternatives
  • ...and 8 more