Table of Contents
Fetching ...

ROOT: Rethinking Offline Optimization as Distributional Translation via Probabilistic Bridge

Manh Cuong Dao, The Hung Tran, Phi Le Nguyen, Thao Nguyen Truong, Trong Nghia Hoang

TL;DR

This work addresses offline black-box optimization under limited offline data by reframing the problem as distributional translation from a low-value input distribution to a high-value regime via a probabilistic bridge. It constructs localized translation flows conditioned on source and target endpoints using a Brownian-bridge diffusion variant and trains a target-agnostic translator with KL objectives, augmented by synthetic data drawn from ensembles of Gaussian processes. The ROOT framework pre-trains and adapts this bridge on a diverse suite of synthetic functions, enabling zero-shot generalization to the unknown target. Empirical results on Design-Bench and RNA-binding benchmarks achieve state-of-the-art performance and demonstrate robust data efficiency, with publicly accessible code.

Abstract

This paper studies the black-box optimization task which aims to find the maxima of a black-box function using a static set of its observed input-output pairs. This is often achieved via learning and optimizing a surrogate function with that offline data. Alternatively, it can also be framed as an inverse modeling task that maps a desired performance to potential input candidates that achieve it. Both approaches are constrained by the limited amount of offline data. To mitigate this limitation, we introduce a new perspective that casts offline optimization as a distributional translation task. This is formulated as learning a probabilistic bridge transforming an implicit distribution of low-value inputs (i.e., offline data) into another distribution of high-value inputs (i.e., solution candidates). Such probabilistic bridge can be learned using low- and high-value inputs sampled from synthetic functions that resemble the target function. These synthetic functions are constructed as the mean posterior of multiple Gaussian processes fitted with different parameterizations on the offline data, alleviating the data bottleneck. The proposed approach is evaluated on an extensive benchmark comprising most recent methods, demonstrating significant improvement and establishing a new state-of-the-art performance. Our code is publicly available at https://github.com/cuong-dm/ROOT.

ROOT: Rethinking Offline Optimization as Distributional Translation via Probabilistic Bridge

TL;DR

This work addresses offline black-box optimization under limited offline data by reframing the problem as distributional translation from a low-value input distribution to a high-value regime via a probabilistic bridge. It constructs localized translation flows conditioned on source and target endpoints using a Brownian-bridge diffusion variant and trains a target-agnostic translator with KL objectives, augmented by synthetic data drawn from ensembles of Gaussian processes. The ROOT framework pre-trains and adapts this bridge on a diverse suite of synthetic functions, enabling zero-shot generalization to the unknown target. Empirical results on Design-Bench and RNA-binding benchmarks achieve state-of-the-art performance and demonstrate robust data efficiency, with publicly accessible code.

Abstract

This paper studies the black-box optimization task which aims to find the maxima of a black-box function using a static set of its observed input-output pairs. This is often achieved via learning and optimizing a surrogate function with that offline data. Alternatively, it can also be framed as an inverse modeling task that maps a desired performance to potential input candidates that achieve it. Both approaches are constrained by the limited amount of offline data. To mitigate this limitation, we introduce a new perspective that casts offline optimization as a distributional translation task. This is formulated as learning a probabilistic bridge transforming an implicit distribution of low-value inputs (i.e., offline data) into another distribution of high-value inputs (i.e., solution candidates). Such probabilistic bridge can be learned using low- and high-value inputs sampled from synthetic functions that resemble the target function. These synthetic functions are constructed as the mean posterior of multiple Gaussian processes fitted with different parameterizations on the offline data, alleviating the data bottleneck. The proposed approach is evaluated on an extensive benchmark comprising most recent methods, demonstrating significant improvement and establishing a new state-of-the-art performance. Our code is publicly available at https://github.com/cuong-dm/ROOT.

Paper Structure

This paper contains 40 sections, 35 equations, 4 figures, 25 tables, 2 algorithms.

Figures (4)

  • Figure 1: Overview of the ROOT workflow: (1) Multiple Gaussian process posteriors are fitted to the offline data, and low- and high-value inputs from the posterior mean functions are sampled to construct a synthetic dataset. (2) This dataset is used to construct our probabilistic bridge model, which learns to map between two different implicit data distributions. (3) The backward process of the learned bridge model is applied to the top-performing inputs from the offline data to generate high-quality candidates for the unknown target function.
  • Figure 2: Score distribution of found candidates of ROOT compared to others. To avoid cluttering the plots with long chunks of text, we use the abbreviation Tri-men* to denote Tri-mentoring.
  • Figure 3: Effect of lengthscale on performance for Ant and DKitty.
  • Figure 4: Distribution of pseudo-values (GP) and oracle values on low- and high-value regions.