Table of Contents
Fetching ...

Multi-Objective Coverage via Constraint Active Search

Zakaria Shams Siam, Xuefeng Liu, Chong Liu

TL;DR

The paper tackles the problem of rapidly identifying a small, representative set of samples whose objective-space outcomes broadly cover a feasible region defined by per-objective thresholds. It introduces MOC-CAS, an optimistic, UCB-based acquisition using independent Gaussian processes for each objective, coupled with a smoothed, differentiable objective that measures new objective-space coverage. The authors derive a three-step smoothing scheme (ball replacement, soft feasibility gate, soft set union) to obtain a closed-form, gradient-friendly acquisition, enabling efficient inner optimization. Empirically, MOC-CAS outperforms baselines on large-scale drug-discovery datasets for SARS-CoV-2 and cancer, achieving higher early discovery (AUP and positives) and better diversity (lower fill distance) under the same budget, with robust performance across five SMILES-derived objectives. This approach offers a practical, scalable path to accelerate scientific discovery under budget constraints by prioritizing coverage of the feasible objective region rather than Pareto-front tracing or input-space diversity.

Abstract

In this paper, we formulate the new multi-objective coverage (MOC) problem where our goal is to identify a small set of representative samples whose predicted outcomes broadly cover the feasible multi-objective space. This problem is of great importance in many critical real-world applications, e.g., drug discovery and materials design, as this representative set can be evaluated much faster than the whole feasible set, thus significantly accelerating the scientific discovery process. Existing works cannot be directly applied as they either focus on sample space coverage or multi-objective optimization that targets the Pareto front. However, chemically diverse samples often yield identical objective profiles, and safety constraints are usually defined on the objectives. To solve this MOC problem, we propose a novel search algorithm, MOC-CAS, which employs an upper confidence bound-based acquisition function to select optimistic samples guided by Gaussian process posterior predictions. For enabling efficient optimization, we develop a smoothed relaxation of the hard feasibility test and derive an approximate optimizer. Compared to the competitive baselines, we show that our MOC-CAS empirically achieves superior performances across large-scale protein-target datasets for SARS-CoV-2 and cancer, each assessed on five objectives derived from SMILES-based features.

Multi-Objective Coverage via Constraint Active Search

TL;DR

The paper tackles the problem of rapidly identifying a small, representative set of samples whose objective-space outcomes broadly cover a feasible region defined by per-objective thresholds. It introduces MOC-CAS, an optimistic, UCB-based acquisition using independent Gaussian processes for each objective, coupled with a smoothed, differentiable objective that measures new objective-space coverage. The authors derive a three-step smoothing scheme (ball replacement, soft feasibility gate, soft set union) to obtain a closed-form, gradient-friendly acquisition, enabling efficient inner optimization. Empirically, MOC-CAS outperforms baselines on large-scale drug-discovery datasets for SARS-CoV-2 and cancer, achieving higher early discovery (AUP and positives) and better diversity (lower fill distance) under the same budget, with robust performance across five SMILES-derived objectives. This approach offers a practical, scalable path to accelerate scientific discovery under budget constraints by prioritizing coverage of the feasible objective region rather than Pareto-front tracing or input-space diversity.

Abstract

In this paper, we formulate the new multi-objective coverage (MOC) problem where our goal is to identify a small set of representative samples whose predicted outcomes broadly cover the feasible multi-objective space. This problem is of great importance in many critical real-world applications, e.g., drug discovery and materials design, as this representative set can be evaluated much faster than the whole feasible set, thus significantly accelerating the scientific discovery process. Existing works cannot be directly applied as they either focus on sample space coverage or multi-objective optimization that targets the Pareto front. However, chemically diverse samples often yield identical objective profiles, and safety constraints are usually defined on the objectives. To solve this MOC problem, we propose a novel search algorithm, MOC-CAS, which employs an upper confidence bound-based acquisition function to select optimistic samples guided by Gaussian process posterior predictions. For enabling efficient optimization, we develop a smoothed relaxation of the hard feasibility test and derive an approximate optimizer. Compared to the competitive baselines, we show that our MOC-CAS empirically achieves superior performances across large-scale protein-target datasets for SARS-CoV-2 and cancer, each assessed on five objectives derived from SMILES-based features.
Paper Structure (32 sections, 32 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 32 sections, 32 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: An example of the MOC problem where $f_1,f_2$ are two objective functions and $\tau_1, \tau_2$ are two thresholds defining the feasible region shown in light green. 15 samples are selected as representative feasible samples, each with a coverage ball whose radius is $r$.
  • Figure 2: Quantitative comparison on the SARS-CoV-2 3CLPro target dataset.
  • Figure 3: Quantitative comparison on the Cancer RTCB (top row) and Cancer 6T2W (bottom row) target datasets.
  • Figure 4: Ablation of coverage resolution $r$ on the SARS-CoV-2 3CLpro dataset.
  • Figure 5: Ablation of optimism schedule $\beta_t$ on the SARS-CoV-2 3CLpro dataset.
  • ...and 1 more figures