The Need for a Big World Simulator: A Scientific Challenge for Continual Learning

Saurabh Kumar; Hong Jun Jeon; Alex Lewandowski; Benjamin Van Roy

The Need for a Big World Simulator: A Scientific Challenge for Continual Learning

Saurabh Kumar, Hong Jun Jeon, Alex Lewandowski, Benjamin Van Roy

TL;DR

This work argues that existing continual learning benchmarks fail to capture the true complexity of learning in a big world with bounded agents. It formalizes an information-theoretic framework for environments and agents, introduces two design desiderata for a big world simulator—no diminishing returns to capacity and ongoing learning for finite-capacity agents—and connects these ideas to forgetting and implasticity through a decomposition of prediction error. The authors propose a Turing-complete, Rule 110-based prediction environment as a concrete illustrative example that satisfies the desiderata and demonstrates capacity-driven improvements and persistent non-stationarity. The practical impact is a principled blueprint for building simulators that enable rapid prototyping at small scale while preserving relevance to real-world continual learning challenges. The work motivates future research toward robust evaluation metrics and algorithms that can better sustain continual engagement with a complex, evolving world.

Abstract

The "small agent, big world" frame offers a conceptual view that motivates the need for continual learning. The idea is that a small agent operating in a much bigger world cannot store all information that the world has to offer. To perform well, the agent must be carefully designed to ingest, retain, and eject the right information. To enable the development of performant continual learning agents, a number of synthetic environments have been proposed. However, these benchmarks suffer from limitations, including unnatural distribution shifts and a lack of fidelity to the "small agent, big world" framing. This paper aims to formalize two desiderata for the design of future simulated environments. These two criteria aim to reflect the objectives and complexity of continual learning in practical settings while enabling rapid prototyping of algorithms on a smaller scale.

The Need for a Big World Simulator: A Scientific Challenge for Continual Learning

TL;DR

Abstract

Paper Structure (17 sections, 1 theorem, 12 equations, 5 figures)

This paper contains 17 sections, 1 theorem, 12 equations, 5 figures.

Introduction
A Common Recipe for Synthetic Continual Learning Benchmarks
Formalizing the Environment and Agent
Notation
The Environment
The Agent
Predictions and Error
Desiderata for a Big World Simulator
There are no diminishing returns to increasing an agent's capacity.
An optimal finite capacity agent interacting with the environment will never stop learning.
Forgetting and Implasticity
An Illustrative Example: Turing-complete Prediction Environment
Conclusion
Additional Details on Existing Continual Learning Benchmarks
Non-synthetic Continual Learning Benchmarks
...and 2 more sections

Key Result

Theorem 1

(forgetting and implasticity) For all agents $\pi:\mathcal{U}\times \mathcal{X} \mapsto \mathcal{U}$, if for all $t,\ U_{t+1} = \pi(U_t, X_t)$, then

Figures (5)

Figure 1: To illustrate the notion that there are no diminishing returns to increasing agent capacity, we plot what the capacity versus optimal prediction error would look like for a $1$-complex environment. This is simply the curve $\mathcal{L}(c) = \frac{1}{c}$. In order reduce error by a factor of $10$, we must put forward $10$ times the capacity. When both the x and y-axes are in log scale, this curve appears as a line with slope of value $1$.
Figure 2: Four different simulations of Rule 110 with periodic boundary conditions, starting from the binary representation of integers 1 (top-left), 2 (top-right), 53 (bottom-left), and 107 (bottom-right). The gray shaded region is the unobserved region used to simulate the infinite state. At each time step, the agent observes a vertical slice of the unshaded region.
Figure 3: The error as a function of depth approximates $k$-complexity. Here, the depth of a feed-forward neural network is our measure of capacity.
Figure 4: Top: Rule 110 updates each cell of the state using the value of the cell as well as its neighbours' values at the previous time step. (Left-Right): There are 8 possible configurations of the cell and its neighbours. Each configuration determines the cell value of the middle cell in the next state. Bottom: Zooming in on one simulation and the three cells in the blue box near the observable region, the fifth rule above is applied to output the middle cell's value at the next time step in the yellow box. This cell's value on the border of the observable region depends on the cell in the unobservable region. Similarly, the 2 blue cells at the top are connected to the single cell in the unobservable region in the bottom.
Figure 5: Top: Online accuracy for the medium sized neural network without regularization (left) and with regenerative regularization (right). Bottom: Larger prediction horizons are more difficult to make, but increasing capacity via depth (left) and width (right) leads to performance improvement. The opacity of each line indicates the prediction horizon.

Theorems & Definitions (4)

Definition 4.1
Definition 4.2
Theorem 1
proof

The Need for a Big World Simulator: A Scientific Challenge for Continual Learning

TL;DR

Abstract

The Need for a Big World Simulator: A Scientific Challenge for Continual Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (4)