In-Context Learning for Pure Exploration
Alessio Russo, Ryan Welch, Aldo Pacchiano
TL;DR
The paper addresses active sequential hypothesis testing (pure exploration) by introducing In-Context Pure Exploration (ICPE), a Transformer-based meta-learning framework that jointly learns data-collection policies and inference rules across task families. ICPE trains two Transformers, $I_\\phi$ for posterior-based inference and $Q_\\theta$ for action selection, and supports both fixed-budget and fixed-confidence settings without requiring explicit likelihood models at test time; inference is performed by a simple forward pass, relying on learned priors over hypotheses. Theoretical results show that the optimal inference is the posterior-maximum and that the learned RL objective aligns with information-rich data collection, with clear fixed-budget and fixed-confidence policy characterizations and stopping criteria that achieve $\\delta$-correctness under identifiability assumptions. Empirically, ICPE matches or surpasses principled baselines on stochastic/deterministic bandits and generalized search tasks (e.g., MNIST region sampling, probabilistic binary search), demonstrating robust transfer across non-tabular environments and latent information structures. This work highlights Transformers as practical, structure-aware architectures for sequential testing and meta-learning, enabling efficient hypothesis identification across diverse tasks without hand-crafted models of the information structure.
Abstract
We study the problem active sequential hypothesis testing, also known as pure exploration: given a new task, the learner adaptively collects data from the environment to efficiently determine an underlying correct hypothesis. A classical instance of this problem is the task of identifying the best arm in a multi-armed bandit problem (a.k.a. BAI, Best-Arm Identification), where actions index hypotheses. Another important case is generalized search, a problem of determining the correct label through a sequence of strategically selected queries that indirectly reveal information about the label. In this work, we introduce In-Context Pure Exploration (ICPE), which meta-trains Transformers to map observation histories to query actions and a predicted hypothesis, yielding a model that transfers in-context. At inference time, ICPE actively gathers evidence on new tasks and infers the true hypothesis without parameter updates. Across deterministic, stochastic, and structured benchmarks, including BAI and generalized search, ICPE is competitive with adaptive baselines while requiring no explicit modeling of information structure. Our results support Transformers as practical architectures for general sequential testing.
