Table of Contents
Fetching ...

When correcting for regression to the mean is worse than no correction at all

José F. Fontanari, Mauro Santos

TL;DR

It is shown that the most robust approach to navigating RTM is not to correct the data, but to evaluate the uncorrected crude slope against a structural null expectation derived from measurement repeatability-the proportion of total variance attributable to true individual differences.

Abstract

The ubiquitous regression to the mean (RTM) effect complicates statistical inference regarding the relationship between baseline levels of a biological variable and its subsequent change. We demonstrate that common RTM correction methods are problematic: the Berry et al. method, popularized by Kelly & Price in The American Naturalist, is unreliable for hypothesis testing or effect-size estimation, leading to systematic bias and inflated error rates. Conversely, while the Blomqvist method is theoretically unbiased, its high sampling variance limits its practical utility in small-to-moderate datasets. Using a structural linear model, we show that the most robust approach to navigating RTM is not to correct the data, but to evaluate the uncorrected crude slope against a structural null expectation derived from measurement repeatability-the proportion of total variance attributable to true individual differences. We illustrate this approach using empirical data from studies on lizard thermal physiology and bird telomere dynamics. Ultimately, we argue that any conclusion regarding a differential treatment effect is statistically unfounded without a clear understanding of the experiment's repeatability.

When correcting for regression to the mean is worse than no correction at all

TL;DR

It is shown that the most robust approach to navigating RTM is not to correct the data, but to evaluate the uncorrected crude slope against a structural null expectation derived from measurement repeatability-the proportion of total variance attributable to true individual differences.

Abstract

The ubiquitous regression to the mean (RTM) effect complicates statistical inference regarding the relationship between baseline levels of a biological variable and its subsequent change. We demonstrate that common RTM correction methods are problematic: the Berry et al. method, popularized by Kelly & Price in The American Naturalist, is unreliable for hypothesis testing or effect-size estimation, leading to systematic bias and inflated error rates. Conversely, while the Blomqvist method is theoretically unbiased, its high sampling variance limits its practical utility in small-to-moderate datasets. Using a structural linear model, we show that the most robust approach to navigating RTM is not to correct the data, but to evaluate the uncorrected crude slope against a structural null expectation derived from measurement repeatability-the proportion of total variance attributable to true individual differences. We illustrate this approach using empirical data from studies on lizard thermal physiology and bird telomere dynamics. Ultimately, we argue that any conclusion regarding a differential treatment effect is statistically unfounded without a clear understanding of the experiment's repeatability.

Paper Structure

This paper contains 22 sections, 29 equations, 10 figures.

Figures (10)

  • Figure 1: Conceptual framework of the structural model and Regression to the Mean (RTM). (A) A Directed Acyclic Graph (DAG) illustrating the causal paths between true states ($X$) and measured values ($x$). The structural estimand $\beta$ represents the true biological effect of the initial state on change. Measurement errors ($\epsilon_1, \epsilon_2$) and biological noise ($\zeta$) create the observed values, while the change score $d$ is derived mathematically. (B) A simulation of $N=100$ individuals where the true structural effect is zero ($\beta = 0$). Parameters are based on a systolic blood pressure model system: $\mu = 141$ mmHg, between-subject SD $\gamma = 13.6$ mmHg, within-subject SD $\delta = 9.1$ mmHg, and stochastic biological noise $\nu = 10$ mmHg. These values result in a repeatability of $R \approx 0.69$. The dashed purple line represents the theoretical identity line (unit slope), corresponding to the idealized case where both the systematic effect and measurement error are null ($\beta = 0, \delta = 0$). The solid green line represents the observed regression ($\beta_c$), which is tilted due to RTM. This illustrates how measurement error alone creates a misleading statistical association even in the absence of a systematic biological trade-off.
  • Figure 2: Comparison of estimated slopes as a function of measurement noise. The three panels illustrate the behavior of the crude regression slope ($\beta_c$, green) and the Berry et al. adjustment ($\beta_B$, purple) compared to the true structural slope ($\beta$, horizontal black line). The x-axis represents the ratio of measurement error variance to initial true variance ($\delta^2/\gamma^2$), where a value of 0 indicates perfect repeatability and higher values indicate increasing measurement noise. In panel (A), which represents the structural null ($\beta = 0$), the crude slope is biased toward $-1$ as noise increases while the Berry et al. estimate remains near the truth. In panel (B), where a moderate effect exists ($\beta = -0.5$), both methods diverge from the truth in opposite directions. In panel (C), representing a strong effect ($\beta = -1.5$), the crude slope approaches $-1$ while the Berry et al. method overcorrects toward zero. All simulation parameters are based on the blood pressure model system ($\mu = 141$ mmHg, $\gamma = 13.6$ mmHg, $\nu = 10$ mmHg).
  • Figure 3: Sampling distributions of regression slope estimators. Box plots represent the distribution of $10^3$ simulated slopes for a null structural effect ($\beta = 0$, panel A) and a moderate negative biological effect ($\beta = -0.5$, panel B). In each plot, the central line denotes the median, the box bounds the interquartile range (IQR), and the whiskers encompass the data within 1.5 times the IQR. Individual points represent outliers beyond this range. The horizontal black line indicates the true biological effect ($\beta$). Results demonstrate that even under a null effect, the crude ($\beta_c$) and Berry et al. ($\beta_B$) estimators remain biased by measurement error ($\delta$), while the Blomqvist ($\beta_e$) and true ($\beta_t$) slopes successfully recover the latent structural parameter. All simulations use the blood pressure model system parameters ($N=100$ per simulation).
  • Figure 4: Procedure to test for a null structural effect ($\beta=0$) using the crude slope. Using a simulated sample of systolic blood pressure ($N=100$), panel (A) shows the scatter plot of change ($d$) against initial value ($x_1$). The resulting empirical crude slope (green line) is $\beta_c = -0.423$. Panel (B) shows the histogram of $10^4$ crude slopes generated by bootstrapping the empirical sample. The vertical green lines indicate the limits of the $95\%$ bootstrap confidence interval ($[-0.569, -0.286]$). The vertical light-blue line indicates the expected null slope ($\beta_c = -0.31$), which is the association expected from measurement error alone when the true biological effect is null. Because the null value falls within the confidence interval, we fail to reject the hypothesis of a null structural effect. Simulation parameters: $\mu = 141$, $\gamma = 13.6$, $\delta = 9.1$, $\alpha = -20$, and $\nu = 10$.
  • Figure 5: Analysis of heat tolerance plasticity for the lizard Anolis carolinensis. (A) Scatter plot of heat tolerance plasticity ($d$, the change in tolerance) against basal heat tolerance ($x_1$) for $N=30$ individuals. To reveal overlapping data points resulting from measurement rounding, a small amount of random jitter has been added to the point positions. The empirical regression (green line) shows a steep negative slope of $\beta_c = -0.872$. (B) Histogram of $10^4$ crude slopes obtained via bootstrapping. The vertical green lines indicate the $95\%$ confidence interval boundaries ($[-1.255, -0.415]$). This analysis replicates the standard approach in the literature, which often interprets such negative slopes as evidence of a biological trade-off or 'compensation, 'though our structural model suggests this result is heavily influenced by regression to the mean.
  • ...and 5 more figures