Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization

Deokjae Lee; Hyun Oh Song; Kyunghyun Cho

Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization

Deokjae Lee, Hyun Oh Song, Kyunghyun Cho

TL;DR

This work introduces a novel greedy-style subset selection algorithm that optimizes batch acquisition directly on the combinatorial space by sequential greedy sampling from the greedy policy, specifically trained to address all greedy subproblems concurrently.

Abstract

Active learning is increasingly adopted for expensive multi-objective combinatorial optimization problems, but it involves a challenging subset selection problem, optimizing the batch acquisition score that quantifies the goodness of a batch for evaluation. Due to the excessively large search space of the subset selection problem, prior methods optimize the batch acquisition on the latent space, which has discrepancies with the actual space, or optimize individual acquisition scores without considering the dependencies among candidates in a batch instead of directly optimizing the batch acquisition. To manage the vast search space, a simple and effective approach is the greedy method, which decomposes the problem into smaller subproblems, yet it has difficulty in parallelization since each subproblem depends on the outcome from the previous ones. To this end, we introduce a novel greedy-style subset selection algorithm that optimizes batch acquisition directly on the combinatorial space by sequential greedy sampling from the greedy policy, specifically trained to address all greedy subproblems concurrently. Notably, our experiments on the red fluorescent proteins design task show that our proposed method achieves the baseline performance in 1.69x fewer queries, demonstrating its efficiency.

Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization

TL;DR

Abstract

Paper Structure (32 sections, 22 theorems, 77 equations, 6 figures, 11 tables, 5 algorithms)

This paper contains 32 sections, 22 theorems, 77 equations, 6 figures, 11 tables, 5 algorithms.

Introduction
Preliminaries
Expensive MOCO
Multi-Round Active Learning
Batch acquisition functions for MOCO
Reinforcement Learning for Single-Objective Combinatorial Optimization
Methods
Learning Greedy Policy
Architecture for set-conditioned policy
Bounds for Approximated Greedy Algorithm
Experiments
Settings
Results on Synthetic Tasks
Results on Batch BO Scenarios
Related Works
...and 17 more sections

Key Result

Lemma 3.3

$\mathrm{GS}(a,\pi_{\theta^*}^\text{set},n,l)$ samples exact greedy solutions almost surely if $\pi_{\theta^*}^\text{set}$ is the greedy policy.

Figures (6)

Figure 1: The visualization of our learning method (\ref{['sec:learning']}). At a high level, a set-conditioned policy $\pi_\theta^\text{set}$ is trained to generate candidates that maximize marginal gain $\Delta_a(\cdot\mid B)$ when conditioned by $B$, where $B$ is sampled by $\pi_\theta^\text{set}$ itself.
Figure 2: Multi-round active learning results and the discovered frontiers on the RFP task under a query limit of $1024$. (a) Midpoint, lower, and upper boundaries show the 50th, 30th, and 70th percentiles, respectively, derived from 10 trials. (b) Colored circles indicates ancestor proteins.
Figure 3: Multi-round active learning results on the 3 Bigrams task when using NEHVI as the batch acquisition function under a query limit of $512$. Midpoint, lower, and upper boundaries show the 50th, 30th, and 70th percentiles, respectively, derived from 10 trials.
Figure 4: Multi-round active learning results on the RFP task when using UCBHVI as the batch acquisition function under a query limit of $512$. Midpoint, lower, and upper boundaries show the 50th, 30th, and 70th percentiles, respectively, derived from 10 trials.
Figure 5: Diversified subset selection results on 2 bigrams task traversing tradeoff parameters. For each tradeoff parameter, $\beta$ for PC-MOGFN, and $\lambda$ for Ours, we plot 3 points for 3 different runs.
...and 1 more figures

Theorems & Definitions (43)

Definition 2.1
Definition 3.1
Definition 3.2
Lemma 3.3
Definition 3.4
Theorem 3.5
proof
Proposition 3.6
Lemma 3.7
Definition 3.8
...and 33 more

Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization

TL;DR

Abstract

Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (43)