Table of Contents
Fetching ...

Prospective Learning: Learning for a Dynamic Future

Ashwin De Silva, Rahul Ramesh, Rubing Yang, Siyu Yu, Joshua T Vogelstein, Pratik Chaudhari

TL;DR

Prospective Learning (PL) reframes learning for dynamic environments where data distributions and objectives evolve, enabling time-aware prediction through a sequence of predictors and a time-augmented loss framework. The paper defines Prospective Risk and Prospective Bayes Risk, introduces Prospective ERM as a strong learner under consistency and concentration conditions, and provides theoretical guarantees for convergence to the Bayes limit in time-varying settings. Empirical validation on synthetic data, MNIST, and CIFAR-10 shows Prospective ERM can track changing tasks and reduce prospective risk, unlike standard ERM and many online continual-learning baselines. The work also explores discounted losses and periodic/Markovian dynamics, discusses connections to related paradigms, and lays groundwork for scalable, time-aware learning in nonstationary real-world systems.

Abstract

In real-world applications, the distribution of the data, and our goals, evolve over time. The prevailing theoretical framework for studying machine learning, namely probably approximately correct (PAC) learning, largely ignores time. As a consequence, existing strategies to address the dynamic nature of data and goals exhibit poor real-world performance. This paper develops a theoretical framework called "Prospective Learning" that is tailored for situations when the optimal hypothesis changes over time. In PAC learning, empirical risk minimization (ERM) is known to be consistent. We develop a learner called Prospective ERM, which returns a sequence of predictors that make predictions on future data. We prove that the risk of prospective ERM converges to the Bayes risk under certain assumptions on the stochastic process generating the data. Prospective ERM, roughly speaking, incorporates time as an input in addition to the data. We show that standard ERM as done in PAC learning, without incorporating time, can result in failure to learn when distributions are dynamic. Numerical experiments illustrate that prospective ERM can learn synthetic and visual recognition problems constructed from MNIST and CIFAR-10. Code at https://github.com/neurodata/prolearn.

Prospective Learning: Learning for a Dynamic Future

TL;DR

Prospective Learning (PL) reframes learning for dynamic environments where data distributions and objectives evolve, enabling time-aware prediction through a sequence of predictors and a time-augmented loss framework. The paper defines Prospective Risk and Prospective Bayes Risk, introduces Prospective ERM as a strong learner under consistency and concentration conditions, and provides theoretical guarantees for convergence to the Bayes limit in time-varying settings. Empirical validation on synthetic data, MNIST, and CIFAR-10 shows Prospective ERM can track changing tasks and reduce prospective risk, unlike standard ERM and many online continual-learning baselines. The work also explores discounted losses and periodic/Markovian dynamics, discusses connections to related paradigms, and lays groundwork for scalable, time-aware learning in nonstationary real-world systems.

Abstract

In real-world applications, the distribution of the data, and our goals, evolve over time. The prevailing theoretical framework for studying machine learning, namely probably approximately correct (PAC) learning, largely ignores time. As a consequence, existing strategies to address the dynamic nature of data and goals exhibit poor real-world performance. This paper develops a theoretical framework called "Prospective Learning" that is tailored for situations when the optimal hypothesis changes over time. In PAC learning, empirical risk minimization (ERM) is known to be consistent. We develop a learner called Prospective ERM, which returns a sequence of predictors that make predictions on future data. We prove that the risk of prospective ERM converges to the Bayes risk under certain assumptions on the stochastic process generating the data. Prospective ERM, roughly speaking, incorporates time as an input in addition to the data. We show that standard ERM as done in PAC learning, without incorporating time, can result in failure to learn when distributions are dynamic. Numerical experiments illustrate that prospective ERM can learn synthetic and visual recognition problems constructed from MNIST and CIFAR-10. Code at https://github.com/neurodata/prolearn.

Paper Structure

This paper contains 53 sections, 6 theorems, 77 equations, 16 figures, 1 table.

Key Result

Proposition 1

There exist stochastic processes for which time-agnostic ERM is not a weak prospective learner. There also exist stochastic processes for which time-agnostic ERM is a weak prospective learner but not a strong one.

Figures (16)

  • Figure 1: A schematic for prospective learning (left) and realizations of the examples for the four scenarios (top right); dots denote 1s and empty spaces denote 0s for $Y_t \in \{0,1\}$ with $X_t = 1$ for all times $t$. Prospective risk of learners at different times is shown in the bottom panels and discussed in \ref{['s:examples']}. \ref{['eg:case1']}: For Bernoulli probability $p=0.2$, the maximum-likelihood estimator (MLE) in blue uses a time-agnostic hypothesis $h_t(X_t) = \mathbf{1}(\hat{p}_t > 0.5)$ where $\hat{p}_t = t^{-1} \sum_{s=1}^t y_s$, ties at $\hat{p}_t=0.5$ are broken randomly. The risk of this learner converges to the Bayes risk. \ref{['eg:case2']}: For Bernoulli probability $p=0.2$, the MLE estimator (blue) performs at chance levels. A prospective learner (red) that alternates between two predictors at even and odd times converges to Bayes risk. Variants of this learner that use less information from the stochastic process (purple does not know that the data distributions at even and odd times are tied, green does not know that the distribution shifts at every time-step) also converge to Bayes risk, but more slowly. \ref{['eg:case3']}: For $\theta = 0.1$ and $\gamma=0.9$ in the discounted prospective risk, the MLE estimator (blue) again performs at chance levels. A prospective learner that computes an estimate of the transition probability of the two-state Markov chain to estimate $\mathop{\mathrm{\mathbb{P}}}\nolimits(Y_{t'} \mid y_t)$ for future times $t' > t$ converges to Bayes risk. \ref{['eg:case4']}: For $\theta_0 = \theta_1 = 0.1$, the MLE estimator (blue) performs at chance levels. A prospective learner that uses a variant of Q-learning (described in the text and \ref{['s:app:scenario4']}) converges to the prospective Bayes risk.
  • Figure 2: Prospective ERM can achieve good instantaneous and prospective risk in \ref{['eg:case2']}.Left: Instantaneous and prospective risks for problems constructed using synthetic data (see text) across 5 random seeds (which govern the sequence of samples and the weight initializations of neural networks). Instantaneous risk spikes when the task switches for many online learning baseline algorithms. In contrast, prospective ERM has minimal spikes at later times and both instantaneous and prospective risks eventually converge to zero. Right: Prospective risk for different baseline algorithms and prospective ERM for tasks constructed using MNIST and CIFAR-10 for \ref{['eg:case2']}. In all three cases, the risk of prospective ERM approaches Bayes risk while online learning baselines considered here do not achieve a low prospective risk. For comparison, the chance prospective risk is 0.5 for synthetic data and 0.742 for MNIST and CIFAR-10 tasks.
  • Figure 3: Left: For MNIST and CIFAR-10, we consider 4 tasks corresponding to the classes 1-5, 4-7, 6-9 and 8-10. Using these tasks, we construct Scenario 3 problems corresponding to a stochastic process which is a hierarchical hidden Markov model. After every 10 time-steps, a different Markov chain governs transitions among tasks (one Markov chain for tasks 1 and 2, and another for tasks 3 and 4). This ensures that the stochastic process does not have a stationary distribution. Right: For synthetic data, the 4 tasks are created using two-dimensional input data as shown pictorially above. The four parts of the input domain are $\{(x_1,x_2): 1\leq x_1,x_2 \leq 2 \}$, $\{(x_1,x_2): 1 \leq x_1 \leq 2, \text{ and} \ -2\leq x_2 \leq -1 \}$, $\{(x_1,x_2): -2 \leq x_1, x_2 \leq -1 \}$ and $\{(x_1,x_2): -2 \leq x_1 \leq -1 \ \text{and} \ 1 \leq x_2 \leq 2 \}$. Colors indicate classes. The hierarchical hidden Markov model for transitions among the tasks is identical to the MNSIT and CIFAR-10 setting shown on the left.
  • Figure 4: Prospective ERM can achieve good prospective risk in \ref{['eg:case3']}. Prospective risk across 5 random seeds (which govern the sequence of samples and the weight initializations of neural networks). In all three cases, the risk of prospective ERM approaches Bayes risk while a number of baseline algorithms do not achieve a low prospective risk. Stochastic processes in these problems corresponding to Scenario 3 do not have an invariant distribution. This is why a time-agnostic hypothesis (ERM) that is constructed by the baseline algorithms does not achieve a good prospective risk.
  • Figure A.1: Prospective risk of MLE (blue), MAP (purple), prospective MAP (red) and random-chance (orange) based learners with respect to time. Both MAP and prospective MAP estimators assume a prior distribution of $\text{Beta}(12, 16)$ over $p$.
  • ...and 11 more figures

Theorems & Definitions (18)

  • Definition 1: Strong Prospective Learnability
  • Definition 2: Weak Prospective Learnability
  • Definition 3: Time-agnostic ERM
  • Proposition 1
  • Theorem 1: Prospective ERM is a strong prospective learner
  • Remark 1: How to implement prospective ERM?
  • Corollary 1
  • Remark 2: Why we need an increasing sequence of hypothesis classes $\mathcal{H}_1 \subseteq \mathcal{H}_2 \dots$
  • Theorem 2
  • Remark 3: Why we do not use existing benchmark continual learning scenarios
  • ...and 8 more