Table of Contents
Fetching ...

Fast Direct: Query-Efficient Online Black-box Guidance for Diffusion-model Target Generation

Kim Yong Tan, Yueming Lyu, Ivor Tsang, Yew-Soon Ong

TL;DR

This work tackles online, query-efficient diffusion-model target generation with black-box objectives, a setting common in tasks like image alignment and molecular design where offline data or differentiable scores are unavailable. It introduces Noise Sequence Optimization with Target Guidance (GNSO), a backbone method that updates diffusion noise along a universal direction on the data manifold, and builds Fast Direct by forming a pseudo-target hat{x}^* via a lightweight surrogate (GP) or historical optimal updates to guide inference. Empirically, Fast Direct achieves 6×–10× query efficiency on 1024×1024 image targets and 11×–44× on 3D-molecule targets, comparing favorably against strong baselines while requiring far fewer online evaluations. The approach is simple, scheduler-agnostic, and easily extensible, offering practical impact for real-world guided diffusion tasks with non-differentiable or costly feedback.

Abstract

Guided diffusion-model generation is a promising direction for customizing the generation process of a pre-trained diffusion model to address specific downstream tasks. Existing guided diffusion models either rely on training the guidance model with pre-collected datasets or require the objective functions to be differentiable. However, for most real-world tasks, offline datasets are often unavailable, and their objective functions are often not differentiable, such as image generation with human preferences, molecular generation for drug discovery, and material design. Thus, we need an $\textbf{online}$ algorithm capable of collecting data during runtime and supporting a $\textbf{black-box}$ objective function. Moreover, the $\textbf{query efficiency}$ of the algorithm is also critical because the objective evaluation of the query is often expensive in real-world scenarios. In this work, we propose a novel and simple algorithm, $\textbf{Fast Direct}$, for query-efficient online black-box target generation. Our Fast Direct builds a pseudo-target on the data manifold to update the noise sequence of the diffusion model with a universal direction, which is promising to perform query-efficient guided generation. Extensive experiments on twelve high-resolution ($\small {1024 \times 1024}$) image target generation tasks and six 3D-molecule target generation tasks show $\textbf{6}\times$ up to $\textbf{10}\times$ query efficiency improvement and $\textbf{11}\times$ up to $\textbf{44}\times$ query efficiency improvement, respectively. Our implementation is publicly available at: https://github.com/kimyong95/guide-stable-diffusion/tree/fast-direct

Fast Direct: Query-Efficient Online Black-box Guidance for Diffusion-model Target Generation

TL;DR

This work tackles online, query-efficient diffusion-model target generation with black-box objectives, a setting common in tasks like image alignment and molecular design where offline data or differentiable scores are unavailable. It introduces Noise Sequence Optimization with Target Guidance (GNSO), a backbone method that updates diffusion noise along a universal direction on the data manifold, and builds Fast Direct by forming a pseudo-target hat{x}^* via a lightweight surrogate (GP) or historical optimal updates to guide inference. Empirically, Fast Direct achieves 6×–10× query efficiency on 1024×1024 image targets and 11×–44× on 3D-molecule targets, comparing favorably against strong baselines while requiring far fewer online evaluations. The approach is simple, scheduler-agnostic, and easily extensible, offering practical impact for real-world guided diffusion tasks with non-differentiable or costly feedback.

Abstract

Guided diffusion-model generation is a promising direction for customizing the generation process of a pre-trained diffusion model to address specific downstream tasks. Existing guided diffusion models either rely on training the guidance model with pre-collected datasets or require the objective functions to be differentiable. However, for most real-world tasks, offline datasets are often unavailable, and their objective functions are often not differentiable, such as image generation with human preferences, molecular generation for drug discovery, and material design. Thus, we need an algorithm capable of collecting data during runtime and supporting a objective function. Moreover, the of the algorithm is also critical because the objective evaluation of the query is often expensive in real-world scenarios. In this work, we propose a novel and simple algorithm, , for query-efficient online black-box target generation. Our Fast Direct builds a pseudo-target on the data manifold to update the noise sequence of the diffusion model with a universal direction, which is promising to perform query-efficient guided generation. Extensive experiments on twelve high-resolution () image target generation tasks and six 3D-molecule target generation tasks show up to query efficiency improvement and up to query efficiency improvement, respectively. Our implementation is publicly available at: https://github.com/kimyong95/guide-stable-diffusion/tree/fast-direct

Paper Structure

This paper contains 21 sections, 1 theorem, 5 equations, 34 figures, 3 tables, 2 algorithms.

Key Result

Proposition 1

Given prior data $\boldsymbol{X}^n= [\boldsymbol{x}^1,\cdots,\boldsymbol{x}^n]$ and its corresponding score $\boldsymbol{y}= [y^1,\cdots,y^n]$, let $\hat{f}(\boldsymbol{x}; \boldsymbol{X}^n)$ denotes the posterior mean of the GP model. For shift-invariant kernels $k(\boldsymbol{z}_1,\boldsymbol{z}_2

Figures (34)

  • Figure 1: Demonstration of guided generation for a given target by Algorithm \ref{['QOBGwithTarget']}. Column 2 (Update Direction) indicates the update term of Algorithm \ref{['QOBGwithTarget']}, Line 8. Rows 1 to 3 analyze how different update directions can affect the generated images. Rows 4 and 5 show that by using the update direction of $\boldsymbol{x}^*-\boldsymbol{x}_K$, the diffusion model can generate visually satisfying images even when the target image is noisy. The noisy target image ($\boldsymbol{x}^*$) of the row 4 is obtained by clean image added with noise $\mathcal{N}(\boldsymbol{0},\boldsymbol{I})$, and row 5 is added with noise $\mathcal{N}(\boldsymbol{0},9 \times \boldsymbol{I})$.
  • Figure 2: The generated images over each number of batch queries on the prompt "deer-eleplant" main_prompt, extra batch query budget (until 500) is given to the baseline methods for demonstration.
  • Figure 3: The 32 randomly generated images for the prompt "deer-eleplant" main_prompt guided by Fast Direct (w/ EDM) by utilizing 50 batch query budget.
  • Figure 4: The average Gemini rating (from 1 to 5, higher is better) of the generated images over each number of batch queries on the 12 different tasks (the prompts abbreviation shown in bracket, see the complete prompts in Appendix \ref{['tab:all_prompts']}).
  • Figure 5: Left column: The average objective score of the generated images over each number of batch queries on the 3 black-box optimization tasks, the images are generated using the 45 common animals that were used in DDPO tang2024tuning as the input prompts. Right column: The objective score of the images generated by unseen prompts, which demonstrates the generalization capability. Note that DNO is not applicable to this task.
  • ...and 29 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof