Diffusion Model for Data-Driven Black-Box Optimization

Zihao Li; Hui Yuan; Kaixuan Huang; Chengzhuo Ni; Yinyu Ye; Minshuo Chen; Mengdi Wang

Diffusion Model for Data-Driven Black-Box Optimization

Zihao Li, Hui Yuan, Kaixuan Huang, Chengzhuo Ni, Yinyu Ye, Minshuo Chen, Mengdi Wang

TL;DR

This paper proposes a reward-directed conditional diffusion model, to be trained on the mixed data, for sampling a near-optimal solution conditioned on high predicted rewards, and establishes sub-optimality error bounds for the generated designs.

Abstract

Generative AI has redefined artificial intelligence, enabling the creation of innovative content and customized solutions that drive business practices into a new era of efficiency and creativity. In this paper, we focus on diffusion models, a powerful generative AI technology, and investigate their potential for black-box optimization over complex structured variables. Consider the practical scenario where one wants to optimize some structured design in a high-dimensional space, based on massive unlabeled data (representing design variables) and a small labeled dataset. We study two practical types of labels: 1) noisy measurements of a real-valued reward function and 2) human preference based on pairwise comparisons. The goal is to generate new designs that are near-optimal and preserve the designed latent structures. Our proposed method reformulates the design optimization problem into a conditional sampling problem, which allows us to leverage the power of diffusion models for modeling complex distributions. In particular, we propose a reward-directed conditional diffusion model, to be trained on the mixed data, for sampling a near-optimal solution conditioned on high predicted rewards. Theoretically, we establish sub-optimality error bounds for the generated designs. The sub-optimality gap nearly matches the optimal guarantee in off-policy bandits, demonstrating the efficiency of reward-directed diffusion models for black-box optimization. Moreover, when the data admits a low-dimensional latent subspace structure, our model efficiently generates high-fidelity designs that closely respect the latent structure. We provide empirical experiments validating our model in decision-making and content-creation tasks.

Diffusion Model for Data-Driven Black-Box Optimization

TL;DR

Abstract

Paper Structure (60 sections, 18 theorems, 188 equations, 9 figures, 1 algorithm)

This paper contains 60 sections, 18 theorems, 188 equations, 9 figures, 1 algorithm.

Introduction
Our Approach.
Contributions.
Related Work
Guided Diffusion Models
Theory of Diffusion Models
Black-Box Optimization
Problem Setup
Reward-Directed Generation via Conditional Diffusion Models
Meta Algorithm
Training of Conditional Diffusion Model
Conditional Score Matching
Conditioned Generation
Statistical Analysis of Reward-Directed Condition Diffusion
Sub-Optimality Decomposition
...and 45 more sections

Key Result

Proposition 4.1

For any $t > 0$ and score estimator $s$, there exists a constant $C_t$ independent of $s$ such that where $\nabla_{x'} \log \phi_t(x' | x) = -\frac{x' - \alpha(t)x}{h(t)}$, where $\phi_t(x' | x)$ is the density of ${\sf N}(\alpha(t)x, h(t)I_D)$ with $\alpha(t) = \exp(- t/2)$ and $h(t) = 1 - \exp(-t)$.

Figures (9)

Figure 1: Generative AI for black-optimization via conditional sampling. We convert the problem of black-box optimization into the problem of sampling from a conditional distribution learned from a pre-collected dataset.
Figure 2: Illustration of distribution shifts in samples and reward, as well as encoder-decoder score networks. When performing reward-directed conditional generation, (a) the distribution of the generated data shifts, but still stays close to the feasible data support; (b) the distribution of the rewards for the generated data shifts and the average reward improves. (c). The score network for reward-directed conditioned diffusion adopts an encoder-decoder structure.
Figure 3: Overview of solving black-box optimization using reward-directed conditional diffusion models. We estimate the reward function from the labeled dataset. Then we compute the estimated reward for each instance in the unlabeled dataset. Next, we train a reward-conditioned diffusion model using the pseudo-labeled data, and generate high reward new instances under proper target reward values.
Figure 4: Quality of generated samples as target reward value increases. Left: Average reward of the generation; Middle: Distribution shift; Right: Off-support deviation. The errorbar is computed by $2$ times the standard deviation over $5$ runs.
Figure 5: Shifting reward distribution of the generated population.
...and 4 more figures

Theorems & Definitions (22)

Proposition 4.1: Score Matching Objective for Implementation
Remark 4.2: Alternative methods
Proposition 5.1
Theorem 5.4: Subspace Fidelity of Generated Data
Remark 5.5
Theorem 6.2
Theorem 6.3: Off-policy Regret of Generated Samples
Theorem 6.7
Theorem 6.9
Lemma B.1
...and 12 more

Diffusion Model for Data-Driven Black-Box Optimization

TL;DR

Abstract

Diffusion Model for Data-Driven Black-Box Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (22)