Table of Contents
Fetching ...

Nonsense associations in Markov random fields with pairwise dependence

Sohom Bhattacharya, Rajarshi Mukherjee, Elizabeth Ogburn

Abstract

Yule (1926) identified the issue of "nonsense correlations" in time series data, where dependence within each of two random vectors causes overdispersion -- i.e. variance inflation -- for measures of dependence between the two. During the near century since then, much has been written about nonsense correlations -- but nearly all of it confined to the time series literature. In this paper we provide the first, to our knowledge, rigorous study of this phenomenon for more general forms of (positive) dependence, specifically for Markov random fields on lattices and graphs. We consider both binary and continuous random vectors and three different measures of association: correlation, covariance, and the ordinary least squares coefficient that results from projecting one random vector onto the other. In some settings we find variance inflation consistent with Yule's nonsense correlation. However, surprisingly, we also find variance deflation in some settings, and in others the variance is unchanged under dependence. Perhaps most notably, we find general conditions under which OLS inference that ignores dependence is valid despite positive dependence in the regression errors, contradicting the presentation of OLS in countless textbooks and courses.

Nonsense associations in Markov random fields with pairwise dependence

Abstract

Yule (1926) identified the issue of "nonsense correlations" in time series data, where dependence within each of two random vectors causes overdispersion -- i.e. variance inflation -- for measures of dependence between the two. During the near century since then, much has been written about nonsense correlations -- but nearly all of it confined to the time series literature. In this paper we provide the first, to our knowledge, rigorous study of this phenomenon for more general forms of (positive) dependence, specifically for Markov random fields on lattices and graphs. We consider both binary and continuous random vectors and three different measures of association: correlation, covariance, and the ordinary least squares coefficient that results from projecting one random vector onto the other. In some settings we find variance inflation consistent with Yule's nonsense correlation. However, surprisingly, we also find variance deflation in some settings, and in others the variance is unchanged under dependence. Perhaps most notably, we find general conditions under which OLS inference that ignores dependence is valid despite positive dependence in the regression errors, contradicting the presentation of OLS in countless textbooks and courses.
Paper Structure (16 sections, 12 theorems, 64 equations, 5 figures)

This paper contains 16 sections, 12 theorems, 64 equations, 5 figures.

Key Result

Theorem 1

Suppose $\mathbf{X}\sim \mathbb{P}_{\beta_1, \mathbf{Q}(\Lambda_{n,d})}$, $\mathbf{Y}\sim \mathbb{P}_{\beta_2, \mathbf{Q}(\Lambda_{n,d})}$ for some $\beta_1,\beta_2<\beta_c(d)$ with $\mathbf{X}$ independent of $\mathbf{Y}$. Then, there exists $v= v(\beta_1,\beta_2) \in \mathbb{R}^+$ such that $\sqrt

Figures (5)

  • Figure 1: Asymptotic variance of $\sqrt{n}T_n$ when $\beta_1=\beta_2=\beta$.
  • Figure 2: Empirical distribution of $\sqrt{n}\rho_n$ over 1000 simulation replicates with $n=10,000$. The inverse temperature parameter $\beta$ ranges from $0$ (i.i.d. data) to the critical value. The standard deviation of the distribution increases monotonically with $\beta$, as does the type I error rate $\alpha$ for a naive null hypothesis test of $\rho=0$.
  • Figure 3: Empirical distribution of $\sqrt{n}T_n$ over 1000 simulation replicates with $n=10,000$. The inverse temperature parameter $\beta$ ranges from $0$ (i.i.d. data) to the critical value. The standard deviation of the distribution increases monotonically with $\beta$, as does the type I error rate $\alpha$ for a naive null hypothesis test of $T=0$.
  • Figure 4: Empirical distribution of $\sqrt{n}\rho_n$ over 1000 simulation replicates with $n=200$ with $\mathbf{X},\mathbf{Y} \sim N(0,\Sigma)$ with $\widetilde{\lambda_1}\gg n\widetilde{\lambda_2.}$
  • Figure 5: Each column corresponds to $95\%$ naive OLS confidence intervals for $\beta$ in the OLS regression of $Y$ onto $X$. In (a) and (b) the eigenvalues of $\Sigma_X$ and $\Sigma_Y$ are both increasing; in (c) and (d) the eigenvalues for $\Sigma_X$ are decreasing and $\Sigma_Y$ increasing. In (e) $\Sigma_X$ is the identity matrix. Each column represents $500$ simulated replications of sample size $n=200$.

Theorems & Definitions (13)

  • Theorem 1
  • Theorem 2
  • Corollary 3
  • Theorem 4
  • Theorem 5
  • Corollary 6
  • Theorem 7
  • Lemma 8
  • Lemma 9
  • Lemma 10
  • ...and 3 more