Table of Contents
Fetching ...

Stable but Wrong: When More Data Degrades Scientific Conclusions

Zhipeng Zhang, Kai Li

Abstract

Modern science increasingly relies on ever-growing observational datasets and automated inference pipelines, under the implicit belief that accumulating more data makes scientific conclusions more reliable. Here we show that this belief can fail in a fundamental and irreversible way. We identify a structural regime in which standard inference procedures converge smoothly, remain well calibrated, and pass conventional diagnostic checks, yet systematically converge to incorrect conclusions. This failure arises when the reliability of observations degrades in a manner that is intrinsically unobservable to the inference process itself. Using minimal synthetic experiments, we demonstrate that in this regime additional data do not correct error but instead amplify it, while residual-based and goodness-of-fit diagnostics remain misleadingly normal. These results reveal an intrinsic limit of data-driven science: stability, convergence, and confidence are not sufficient indicators of epistemic validity. We argue that inference cannot be treated as an unconditional consequence of data availability, but must instead be governed by explicit constraints on the integrity of the observational process.

Stable but Wrong: When More Data Degrades Scientific Conclusions

Abstract

Modern science increasingly relies on ever-growing observational datasets and automated inference pipelines, under the implicit belief that accumulating more data makes scientific conclusions more reliable. Here we show that this belief can fail in a fundamental and irreversible way. We identify a structural regime in which standard inference procedures converge smoothly, remain well calibrated, and pass conventional diagnostic checks, yet systematically converge to incorrect conclusions. This failure arises when the reliability of observations degrades in a manner that is intrinsically unobservable to the inference process itself. Using minimal synthetic experiments, we demonstrate that in this regime additional data do not correct error but instead amplify it, while residual-based and goodness-of-fit diagnostics remain misleadingly normal. These results reveal an intrinsic limit of data-driven science: stability, convergence, and confidence are not sufficient indicators of epistemic validity. We argue that inference cannot be treated as an unconditional consequence of data availability, but must instead be governed by explicit constraints on the integrity of the observational process.
Paper Structure (24 sections, 1 theorem, 7 equations, 13 figures)

This paper contains 24 sections, 1 theorem, 7 equations, 13 figures.

Key Result

Proposition 1

Consider a data-generating process where $\epsilon_t$ are independent, zero-mean noise terms with finite variance, and $b_t$ is a deterministic or stochastic drift process satisfying: (i) $b_t$ varies slowly relative to the observation noise, and (ii) $b_t$ is not identifiable from any finite window of observations. Let $\hat{\theta} whenever the limit exists. In particular, if the time-averaged

Figures (13)

  • Figure 1: The invisibility of drift. Observed data with unobservable drift demonstrates how systematic bias can accumulate while individual observations appear normally distributed around the drifting mean, showing no visible anomaly that would alert conventional diagnostics. This creates precisely the conditions where inference proceeds without alarm, yet is fundamentally compromised.
  • Figure 2: Stable convergence to false certainty. Posterior mean of $\theta$ under unobservable drift demonstrates that numerical stability and contracting uncertainty can systematically lead to biased conclusions, showing that conventional indicators of reliability become misleading when observational integrity degrades invisibly.
  • Figure 3: The paradox of more data. Absolute inference error versus data volume under unobservable drift demonstrates that additional observations can systematically increase error, directly contradicting the foundational assumption that more data invariably improves inference. This reveals a regime where data accumulation becomes epistemically harmful.
  • Figure 4: Causal control establishes specificity. No-drift control experiment confirms that the failure mode arises specifically from unobservable reliability degradation, not from any deficiency in the inference algorithm, by showing that error decreases monotonically with more data in the absence of drift.
  • Figure 5: Robustness to non-linearity. Inference error under random-walk drift demonstrates that the phenomenon persists under non-linear, stochastic drift processes, establishing that the epistemic trap is not an artifact of simple linear trends but a structural feature of inference under unobservable reliability degradation.
  • ...and 8 more figures

Theorems & Definitions (1)

  • Proposition 1: Stable convergence under unobservable drift