Table of Contents
Fetching ...

Amortized nonmyopic active search via deep imitation learning

Quan Nguyen, Anindya Sarkar, Roman Garnett

TL;DR

Active search for rare targets with costly labeling is framed around maximizing $u(\mathcal{D}_T)$ under budget $T$, where the state-of-the-art policy ENS offers nonmyopic decisions but at superlinear cost in $n$. The authors propose ans, a neural policy trained by imitation learning (DAgger) to mimic ENS on synthetic Gaussian-process–generated problems, enabling real-time decisions. A compact state representation—including $\Pr(y=1\mid x,\mathcal{D})$, the remaining budget $\ell$, and neighbor-based summaries—coupled with a GP-based problem generator and Faiss-based nearest-neighbor search, yields performance close to ENS with far lower computation. This approach broadens the applicability of nonmyopic active search to large-scale domains such as drug discovery and recommender systems, where millions of candidates must be explored under tight labeling budgets.

Abstract

Active search formalizes a specialized active learning setting where the goal is to collect members of a rare, valuable class. The state-of-the-art algorithm approximates the optimal Bayesian policy in a budget-aware manner, and has been shown to achieve impressive empirical performance in previous work. However, even this approximate policy has a superlinear computational complexity with respect to the size of the search problem, rendering its application impractical in large spaces or in real-time systems where decisions must be made quickly. We study the amortization of this policy by training a neural network to learn to search. To circumvent the difficulty of learning from scratch, we appeal to imitation learning techniques to mimic the behavior of the expert, expensive-to-compute policy. Our policy network, trained on synthetic data, learns a beneficial search strategy that yields nonmyopic decisions carefully balancing exploration and exploitation. Extensive experiments demonstrate our policy achieves competitive performance at real-world tasks that closely approximates the expert's at a fraction of the cost, while outperforming cheaper baselines.

Amortized nonmyopic active search via deep imitation learning

TL;DR

Active search for rare targets with costly labeling is framed around maximizing under budget , where the state-of-the-art policy ENS offers nonmyopic decisions but at superlinear cost in . The authors propose ans, a neural policy trained by imitation learning (DAgger) to mimic ENS on synthetic Gaussian-process–generated problems, enabling real-time decisions. A compact state representation—including , the remaining budget , and neighbor-based summaries—coupled with a GP-based problem generator and Faiss-based nearest-neighbor search, yields performance close to ENS with far lower computation. This approach broadens the applicability of nonmyopic active search to large-scale domains such as drug discovery and recommender systems, where millions of candidates must be explored under tight labeling budgets.

Abstract

Active search formalizes a specialized active learning setting where the goal is to collect members of a rare, valuable class. The state-of-the-art algorithm approximates the optimal Bayesian policy in a budget-aware manner, and has been shown to achieve impressive empirical performance in previous work. However, even this approximate policy has a superlinear computational complexity with respect to the size of the search problem, rendering its application impractical in large spaces or in real-time systems where decisions must be made quickly. We study the amortization of this policy by training a neural network to learn to search. To circumvent the difficulty of learning from scratch, we appeal to imitation learning techniques to mimic the behavior of the expert, expensive-to-compute policy. Our policy network, trained on synthetic data, learns a beneficial search strategy that yields nonmyopic decisions carefully balancing exploration and exploitation. Extensive experiments demonstrate our policy achieves competitive performance at real-world tasks that closely approximates the expert's at a fraction of the cost, while outperforming cheaper baselines.
Paper Structure (17 sections, 2 equations, 8 figures, 2 tables, 2 algorithms)

This paper contains 17 sections, 2 equations, 8 figures, 2 tables, 2 algorithms.

Figures (8)

  • Figure 1: Demonstration of our trained policy's budget-awareness with a toy example. Left panel: the probability that a point is a target. Remaining panels: computed logits and the point selected to be the next query under different labeling budgets. Our policy appropriately balances between exploitation under a small labeling budget and strategic exploration if the budget is large.
  • Figure 2: The time taken per iteration by different active search policies in the small- and medium-scale experiments. Left: average number of seconds per iteration with respect to the size of the search space. Right: average number of targets found and standard errors vs. time per iteration.
  • Figure 3: Locations in Côte d'Ivoire selected by the one-step policy and by ours in an illustrative run with the disease hotspot data, where our policy discovers a larger target cluster.
  • Figure 4: The average difference in cumulative reward and standard errors between our policy and one-step. Our policy spends its initial budget exploring the space and finds fewer targets in the beginning but smoothly switches to more exploitative queries and outperforms one-step at the end.
  • Figure 5: An example two-dimensional problem generated by \ref{['alg:generate_problem']}, where bright and dark points indicate targets and non-targets, respectively. The search space includes both clusters and more widely dispersed points.
  • ...and 3 more figures