In-Context Learning for Pure Exploration in Continuous Spaces

Alessio Russo; Yin-Ching Lee; Ryan Welch; Aldo Pacchiano

In-Context Learning for Pure Exploration in Continuous Spaces

Alessio Russo, Yin-Ching Lee, Ryan Welch, Aldo Pacchiano

TL;DR

The paper addresses continuous pure exploration under fixed-confidence by proposing C-ICPE-TS, a meta-trained Thompson Sampling framework that learns when to query and when to stop in continuous action spaces without explicit likelihood models. By grounding inference in posterior success probabilities and leveraging a stop/continue Bellman structure, the approach achieves $(\epsilon,\delta)$-PAC guarantees while efficiently identifying $\epsilon$-optimal hypotheses. Empirically, C-ICPE-TS shows superior sample efficiency across continuous analogues of binary search, $\epsilon$-best-arm identification, and gradient-free function minimization (Ackley), with robust performance under prior misspecification. The work advances learned sequential testing in continuous domains and has potential impact on costlier experimental settings where data are noisy and experiments are expensive.

Abstract

In active sequential testing, also termed pure exploration, a learner is tasked with the goal to adaptively acquire information so as to identify an unknown ground-truth hypothesis with as few queries as possible. This problem, originally studied by Chernoff in 1959, has several applications: classical formulations include Best-Arm Identification (BAI) in bandits, where actions index hypotheses, and generalized search problems, where strategically chosen queries reveal partial information about a hidden label. In many modern settings, however, the hypothesis space is continuous and naturally coincides with the query/action space: for example, identifying an optimal action in a continuous-armed bandit, localizing an $ε$-ball contained in a target region, or estimating the minimizer of an unknown function from a sequence of observations. In this work, we study pure exploration in such continuous spaces and introduce Continuous In-Context Pure Exploration for this regime. We introduce C-ICPE-TS, an algorithm that meta-trains deep neural policies to map observation histories to (i) the next continuous query action and (ii) a predicted hypothesis, thereby learning transferable sequential testing strategies directly from data. At inference time, C-ICPE-TS actively gathers evidence on previously unseen tasks and infers the true hypothesis without parameter updates or explicit hand-crafted information models. We validate C-ICPE-TS across a range of benchmarks, spanning continuous best-arm identification, region localization, and function minimizer identification.

In-Context Learning for Pure Exploration in Continuous Spaces

TL;DR

-PAC guarantees while efficiently identifying

-optimal hypotheses. Empirically, C-ICPE-TS shows superior sample efficiency across continuous analogues of binary search,

-best-arm identification, and gradient-free function minimization (Ackley), with robust performance under prior misspecification. The work advances learned sequential testing in continuous domains and has potential impact on costlier experimental settings where data are noisy and experiments are expensive.

Abstract

-ball contained in a target region, or estimating the minimizer of an unknown function from a sequence of observations. In this work, we study pure exploration in such continuous spaces and introduce Continuous In-Context Pure Exploration for this regime. We introduce C-ICPE-TS, an algorithm that meta-trains deep neural policies to map observation histories to (i) the next continuous query action and (ii) a predicted hypothesis, thereby learning transferable sequential testing strategies directly from data. At inference time, C-ICPE-TS actively gathers evidence on previously unseen tasks and infers the true hypothesis without parameter updates or explicit hand-crafted information models. We validate C-ICPE-TS across a range of benchmarks, spanning continuous best-arm identification, region localization, and function minimizer identification.

Paper Structure (48 sections, 9 theorems, 81 equations, 9 figures, 7 tables, 1 algorithm)

This paper contains 48 sections, 9 theorems, 81 equations, 9 figures, 7 tables, 1 algorithm.

Introduction
Problem Setting
Theoretical Background
Optimal inference and the posterior success function.
Dual formulation, stopping and optimality.
Deep Learning and Thompson Sampling for Exploration in Continuous Space
Training protocol.
Posterior distribution, recommendation and exploration.
Critic learning and stopping rule.
Cost update.
Model Architecture and Time Pooling Layer
Empirical Evaluation
Baselines
Binary Search Problem
Problem Description.
...and 33 more sections

Key Result

Lemma A.3

For each $t\in\mathbb N$ there exists a probability kernel $R_t:\mathcal{H}_t\times\mathcal{B}(\Theta)\to[0,1]$, independent of $\pi$, such that for every policy $\pi$, all $A\in\mathcal{B}(\Theta)$ and $Z\in\mathcal{B}(\mathcal{H}_t)$, where $\mathbf P_t^\pi({\rm d}\theta,{\rm d}h)=\nu({\rm d}\theta)\,\mathbb P_{\theta,t}^\pi({\rm d}h)$ and $\mathbb P_t^\pi$ is its $\mathcal{H}_t$-marginal. More

Figures (9)

Figure 1: C-ICPE-TS is able to identify the global maxima (with $\epsilon$-accuracy and $1-\delta$ confidence) of the noisy Ackley function (with random parameters and observation noise), while trying to use the least number of data-points.
Figure 2: Shared history encoder and time-pooling readout used by the inference network $I_\phi$ and critic $Q_\theta$.
Figure 3: Binary search problem in $\mathcal{X} = [-1,1]^2$: a query $A_t$ yields $Y_t$; the goal is to predict $\hat{x}$ within $\varepsilon$ of $x^\star$.
Figure 4: $\epsilon$-best-arm problem in $\mathcal{X} = [-1,1]^2$: C-ICPE queries $A_t$ yields $Y_{t}$; the goal is to predict $\hat{x}$ within $\varepsilon$ of $x^\star$.
Figure 5: Effect of parameters $b$ and $c$ on the Ackley function landscape.
...and 4 more figures

Theorems & Definitions (19)

Example 2.1
Lemma A.3: Posterior kernel over $\Theta$
proof
Proposition A.4: Optimal inference
proof
Lemma A.5: Markov lower bound on posterior success
proof
Lemma A.6: Stopped success as expected posterior success
proof
Lemma A.7: Embedding stopping times as a stop action
...and 9 more

In-Context Learning for Pure Exploration in Continuous Spaces

TL;DR

Abstract

In-Context Learning for Pure Exploration in Continuous Spaces

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (19)