Amortized nonmyopic active search via deep imitation learning
Quan Nguyen, Anindya Sarkar, Roman Garnett
TL;DR
Active search for rare targets with costly labeling is framed around maximizing $u(\mathcal{D}_T)$ under budget $T$, where the state-of-the-art policy ENS offers nonmyopic decisions but at superlinear cost in $n$. The authors propose ans, a neural policy trained by imitation learning (DAgger) to mimic ENS on synthetic Gaussian-process–generated problems, enabling real-time decisions. A compact state representation—including $\Pr(y=1\mid x,\mathcal{D})$, the remaining budget $\ell$, and neighbor-based summaries—coupled with a GP-based problem generator and Faiss-based nearest-neighbor search, yields performance close to ENS with far lower computation. This approach broadens the applicability of nonmyopic active search to large-scale domains such as drug discovery and recommender systems, where millions of candidates must be explored under tight labeling budgets.
Abstract
Active search formalizes a specialized active learning setting where the goal is to collect members of a rare, valuable class. The state-of-the-art algorithm approximates the optimal Bayesian policy in a budget-aware manner, and has been shown to achieve impressive empirical performance in previous work. However, even this approximate policy has a superlinear computational complexity with respect to the size of the search problem, rendering its application impractical in large spaces or in real-time systems where decisions must be made quickly. We study the amortization of this policy by training a neural network to learn to search. To circumvent the difficulty of learning from scratch, we appeal to imitation learning techniques to mimic the behavior of the expert, expensive-to-compute policy. Our policy network, trained on synthetic data, learns a beneficial search strategy that yields nonmyopic decisions carefully balancing exploration and exploitation. Extensive experiments demonstrate our policy achieves competitive performance at real-world tasks that closely approximates the expert's at a fraction of the cost, while outperforming cheaper baselines.
