On the Unknowable Limits to Prediction
Jiani Yan, Charles Rahal
TL;DR
The paper reframes prediction limits by introducing an information-set–conditional decomposition of error into epistemic and aleatoric components, arguing that irreducible error need not be treated as a fixed ceiling. It formalizes true versus observed quantities and derives a comprehensive error decomposition, linking model and data measurement improvements to reductions in epistemic error while acknowledging unavoidable aleatoric randomness. By connecting to the bias-variance framework and learning-curve analysis, it shows how predictive accuracy can improve monotonically with better measurement, construct validity, and model approximation, even in the absence of distributional shift. The framework has practical implications for advancing predictive science across domains, including AI systems and social data, guiding data collection, feature construction, and modeling choices to progressively tighten predictions while recognizing intrinsic stochasticity.
Abstract
We propose a rigorous decomposition of predictive error, highlighting that not all 'irreducible' error is genuinely immutable. Many domains stand to benefit from iterative enhancements in measurement, construct validity, and modeling. Our approach demonstrates how apparently 'unpredictable' outcomes can become more tractable with improved data (across both target and features) and refined algorithms. By distinguishing aleatoric from epistemic error, we delineate how accuracy may asymptotically improve--though inherent stochasticity may remain--and offer a robust framework for advancing computational research.
