Learning Universal Predictors
Jordi Grau-Moya, Tim Genewein, Marcus Hutter, Laurent Orseau, Grégoire Delétang, Elliot Catt, Anian Ruoss, Li Kevin Wenliang, Christopher Mattern, Matthew Aitchison, Joel Veness
TL;DR
Introduces Solomonoff Induction and its universal prior $M$, highlighting its incomputability and the motivation to approximate it via meta-learning. Frames meta-learning as amortized Solomonoff Induction, showing that neural models trained on diverse, algorithmically generated data can approximate the Bayesian mixture over programs and converge toward the normalized prior $M^{norm}$ under suitable assumptions. Defines computable Solomonoff data generators $M_{s,L,n}$ and the normalized prior $M^{norm}$, proving consistency of empirical estimates and outlining training with fixed-length sequences to realize convergence to $\hat{M}^{norm}$. Extends to non-uniform sampling with $M_U^Q$ and demonstrates universality under mild conditions, then provides extensive experiments across UTMs, VOMS, and Chomsky-hierarchy tasks showing scaling and universal-data jointly promote increasingly universal predictive capabilities with transferable patterns across domains.
Abstract
Meta-learning has emerged as a powerful approach to train neural networks to learn new tasks quickly from limited data. Broad exposure to different tasks leads to versatile representations enabling general problem solving. But, what are the limits of meta-learning? In this work, we explore the potential of amortizing the most powerful universal predictor, namely Solomonoff Induction (SI), into neural networks via leveraging meta-learning to its limits. We use Universal Turing Machines (UTMs) to generate training data used to expose networks to a broad range of patterns. We provide theoretical analysis of the UTM data generation processes and meta-training protocols. We conduct comprehensive experiments with neural architectures (e.g. LSTMs, Transformers) and algorithmic data generators of varying complexity and universality. Our results suggest that UTM data is a valuable resource for meta-learning, and that it can be used to train neural networks capable of learning universal prediction strategies.
