Table of Contents
Fetching ...

On the Unknowable Limits to Prediction

Jiani Yan, Charles Rahal

TL;DR

The paper reframes prediction limits by introducing an information-set–conditional decomposition of error into epistemic and aleatoric components, arguing that irreducible error need not be treated as a fixed ceiling. It formalizes true versus observed quantities and derives a comprehensive error decomposition, linking model and data measurement improvements to reductions in epistemic error while acknowledging unavoidable aleatoric randomness. By connecting to the bias-variance framework and learning-curve analysis, it shows how predictive accuracy can improve monotonically with better measurement, construct validity, and model approximation, even in the absence of distributional shift. The framework has practical implications for advancing predictive science across domains, including AI systems and social data, guiding data collection, feature construction, and modeling choices to progressively tighten predictions while recognizing intrinsic stochasticity.

Abstract

We propose a rigorous decomposition of predictive error, highlighting that not all 'irreducible' error is genuinely immutable. Many domains stand to benefit from iterative enhancements in measurement, construct validity, and modeling. Our approach demonstrates how apparently 'unpredictable' outcomes can become more tractable with improved data (across both target and features) and refined algorithms. By distinguishing aleatoric from epistemic error, we delineate how accuracy may asymptotically improve--though inherent stochasticity may remain--and offer a robust framework for advancing computational research.

On the Unknowable Limits to Prediction

TL;DR

The paper reframes prediction limits by introducing an information-set–conditional decomposition of error into epistemic and aleatoric components, arguing that irreducible error need not be treated as a fixed ceiling. It formalizes true versus observed quantities and derives a comprehensive error decomposition, linking model and data measurement improvements to reductions in epistemic error while acknowledging unavoidable aleatoric randomness. By connecting to the bias-variance framework and learning-curve analysis, it shows how predictive accuracy can improve monotonically with better measurement, construct validity, and model approximation, even in the absence of distributional shift. The framework has practical implications for advancing predictive science across domains, including AI systems and social data, guiding data collection, feature construction, and modeling choices to progressively tighten predictions while recognizing intrinsic stochasticity.

Abstract

We propose a rigorous decomposition of predictive error, highlighting that not all 'irreducible' error is genuinely immutable. Many domains stand to benefit from iterative enhancements in measurement, construct validity, and modeling. Our approach demonstrates how apparently 'unpredictable' outcomes can become more tractable with improved data (across both target and features) and refined algorithms. By distinguishing aleatoric from epistemic error, we delineate how accuracy may asymptotically improve--though inherent stochasticity may remain--and offer a robust framework for advancing computational research.

Paper Structure

This paper contains 8 sections, 12 equations, 2 figures.

Figures (2)

  • Figure 1: Decomposed Learning Curves. The functional form of learning curves can vary depending on the targets, algorithms, and features being used, and different amounts of truly irreducible error may exist for different target variables (see Supplementary Figure \ref{['figure_si1']}). As relevant information accumulates, the learning curve -- defined as the functional representation of a model's predictive performance against the amount of information which it receives -- is expected to be monotonic, but may not be entirely continuous. Panel 'a.' represents a baseline scenario which positions specific studies against how well information on target and features has been measured (and how well such information has been mapped into learning algorithms), while Panels 'b.' and 'c.' represent changes (and gains and changes in predictive performance) with regards to construct validity of $y$ and $\mathbf{x}$ respectively. Note: 'obs.' denotes 'observed' variables as per Equation \ref{['eq:example']}.
  • Figure 1: Supplementary Figure 1: Additional Learning Curve Examples. Two complementary examples to Figure 1. Panel 'a.' represents a target variable with relatively little aleatoric error, and a rapid learning rate with large gains from eliminating measurement error. Panel 'b.' represents a target variable with relatively higher aleatoric error, a slower learning rate, and relatively fewer gains from improving measurement. Note: there is no requirement for such learning curves to be continuous functions. Dashed horizontal red lines denote what is commonly known as the 'predictive ceiling' (no prediction error). Golden arrows denote the rate at which reducible error is reducing. Dashed blue lines represents a less accurate learning algorithm, and the solid blue line represents the case where learning error is entirely eliminated.