ROOT: Rethinking Offline Optimization as Distributional Translation via Probabilistic Bridge
Manh Cuong Dao, The Hung Tran, Phi Le Nguyen, Thao Nguyen Truong, Trong Nghia Hoang
TL;DR
This work addresses offline black-box optimization under limited offline data by reframing the problem as distributional translation from a low-value input distribution to a high-value regime via a probabilistic bridge. It constructs localized translation flows conditioned on source and target endpoints using a Brownian-bridge diffusion variant and trains a target-agnostic translator with KL objectives, augmented by synthetic data drawn from ensembles of Gaussian processes. The ROOT framework pre-trains and adapts this bridge on a diverse suite of synthetic functions, enabling zero-shot generalization to the unknown target. Empirical results on Design-Bench and RNA-binding benchmarks achieve state-of-the-art performance and demonstrate robust data efficiency, with publicly accessible code.
Abstract
This paper studies the black-box optimization task which aims to find the maxima of a black-box function using a static set of its observed input-output pairs. This is often achieved via learning and optimizing a surrogate function with that offline data. Alternatively, it can also be framed as an inverse modeling task that maps a desired performance to potential input candidates that achieve it. Both approaches are constrained by the limited amount of offline data. To mitigate this limitation, we introduce a new perspective that casts offline optimization as a distributional translation task. This is formulated as learning a probabilistic bridge transforming an implicit distribution of low-value inputs (i.e., offline data) into another distribution of high-value inputs (i.e., solution candidates). Such probabilistic bridge can be learned using low- and high-value inputs sampled from synthetic functions that resemble the target function. These synthetic functions are constructed as the mean posterior of multiple Gaussian processes fitted with different parameterizations on the offline data, alleviating the data bottleneck. The proposed approach is evaluated on an extensive benchmark comprising most recent methods, demonstrating significant improvement and establishing a new state-of-the-art performance. Our code is publicly available at https://github.com/cuong-dm/ROOT.
