Table of Contents
Fetching ...

How Patterns Dictate Learnability in Sequential Data

Mario Morawski, Anais Despres, Rémi Rehm

TL;DR

This work tackles the question of intrinsic learnability in sequential data by introducing a predictive-information framework anchored in $I_{ ext{pred}}(k,k')$. It establishes a universal learning curve $\Lambda(k)$ that ties the growth of predictive information to the minimal achievable risk, and derives explicit asymptotics for parametric and Markov regimes. The authors propose a model-aware estimator of the intrinsic risk $\hat{\mathcal{R}}^{\infty}(Q^{*})$ and demonstrate, on synthetic data, how this tool can diagnose whether predictive gaps arise from data structure or model capacity. The framework provides a principled, information-theoretic diagnostic for model adequacy and pattern strength in sequential data, with practical implications for model selection and understanding fundamental limits of forecasting.

Abstract

Sequential data - ranging from financial time series to natural language - has driven the growing adoption of autoregressive models. However, these algorithms rely on the presence of underlying patterns in the data, and their identification often depends heavily on human expertise. Misinterpreting these patterns can lead to model misspecification, resulting in increased generalization error and degraded performance. The recently proposed evolving pattern (EvoRate) metric addresses this by using the mutual information between the next data point and its past to guide regression order estimation and feature selection. Building on this idea, we introduce a general framework based on predictive information, defined as the mutual information between the past and the future, $I(X_{past}; X_{future})$. This quantity naturally defines an information-theoretic learning curve, which quantifies the amount of predictive information available as the observation window grows. Using this formalism, we show that the presence or absence of temporal patterns fundamentally constrains the learnability of sequential models: even an optimal predictor cannot outperform the intrinsic information limit imposed by the data. We validate our framework through experiments on synthetic data, demonstrating its ability to assess model adequacy, quantify the inherent complexity of a dataset, and reveal interpretable structure in sequential data.

How Patterns Dictate Learnability in Sequential Data

TL;DR

This work tackles the question of intrinsic learnability in sequential data by introducing a predictive-information framework anchored in . It establishes a universal learning curve that ties the growth of predictive information to the minimal achievable risk, and derives explicit asymptotics for parametric and Markov regimes. The authors propose a model-aware estimator of the intrinsic risk and demonstrate, on synthetic data, how this tool can diagnose whether predictive gaps arise from data structure or model capacity. The framework provides a principled, information-theoretic diagnostic for model adequacy and pattern strength in sequential data, with practical implications for model selection and understanding fundamental limits of forecasting.

Abstract

Sequential data - ranging from financial time series to natural language - has driven the growing adoption of autoregressive models. However, these algorithms rely on the presence of underlying patterns in the data, and their identification often depends heavily on human expertise. Misinterpreting these patterns can lead to model misspecification, resulting in increased generalization error and degraded performance. The recently proposed evolving pattern (EvoRate) metric addresses this by using the mutual information between the next data point and its past to guide regression order estimation and feature selection. Building on this idea, we introduce a general framework based on predictive information, defined as the mutual information between the past and the future, . This quantity naturally defines an information-theoretic learning curve, which quantifies the amount of predictive information available as the observation window grows. Using this formalism, we show that the presence or absence of temporal patterns fundamentally constrains the learnability of sequential models: even an optimal predictor cannot outperform the intrinsic information limit imposed by the data. We validate our framework through experiments on synthetic data, demonstrating its ability to assess model adequacy, quantify the inherent complexity of a dataset, and reveal interpretable structure in sequential data.

Paper Structure

This paper contains 49 sections, 19 theorems, 76 equations, 4 figures, 8 tables, 1 algorithm.

Key Result

Proposition 4.1

Under hypothesis $\mathbf{(H_0)}$, we have: Proof in Appendix prop:learning_curve_approximation.

Figures (4)

  • Figure 1: Estimation of $\mathbf{I}_{\text{pred}}(k, k')$ using various neural-based methods. Color encodes the estimation bias: blue regions indicate negative bias (underestimation), while the intensity reflects the magnitude of this bias.
  • Figure 2: Learning curves $\Lambda(k)$ for AR processes with orders $p=5$ and $p=10$.
  • Figure 3: Estimation of $\mathbf{I}_{\text{pred}}(2, 5)$ using various neural-based methods.
  • Figure 4: Estimation of $\mathbf{I}_{\text{pred}}(10, 20)$ using various neural-based methods.

Theorems & Definitions (33)

  • Proposition 4.1: Bialek and Tishby (1999) bialek1999predictive
  • Proposition 4.2: Predictive information in Markov processes
  • Theorem 4.3: Predictive information in parametric models
  • Corollary 4.4: Universal Learning Curve Decay
  • Proposition 4.5
  • Proposition 4.6
  • Corollary 4.7
  • Remark 5.1
  • Proposition A.1: Elementary properties of the predictive mutual information
  • proof
  • ...and 23 more