Fast Direct: Query-Efficient Online Black-box Guidance for Diffusion-model Target Generation
Kim Yong Tan, Yueming Lyu, Ivor Tsang, Yew-Soon Ong
TL;DR
This work tackles online, query-efficient diffusion-model target generation with black-box objectives, a setting common in tasks like image alignment and molecular design where offline data or differentiable scores are unavailable. It introduces Noise Sequence Optimization with Target Guidance (GNSO), a backbone method that updates diffusion noise along a universal direction on the data manifold, and builds Fast Direct by forming a pseudo-target hat{x}^* via a lightweight surrogate (GP) or historical optimal updates to guide inference. Empirically, Fast Direct achieves 6×–10× query efficiency on 1024×1024 image targets and 11×–44× on 3D-molecule targets, comparing favorably against strong baselines while requiring far fewer online evaluations. The approach is simple, scheduler-agnostic, and easily extensible, offering practical impact for real-world guided diffusion tasks with non-differentiable or costly feedback.
Abstract
Guided diffusion-model generation is a promising direction for customizing the generation process of a pre-trained diffusion model to address specific downstream tasks. Existing guided diffusion models either rely on training the guidance model with pre-collected datasets or require the objective functions to be differentiable. However, for most real-world tasks, offline datasets are often unavailable, and their objective functions are often not differentiable, such as image generation with human preferences, molecular generation for drug discovery, and material design. Thus, we need an $\textbf{online}$ algorithm capable of collecting data during runtime and supporting a $\textbf{black-box}$ objective function. Moreover, the $\textbf{query efficiency}$ of the algorithm is also critical because the objective evaluation of the query is often expensive in real-world scenarios. In this work, we propose a novel and simple algorithm, $\textbf{Fast Direct}$, for query-efficient online black-box target generation. Our Fast Direct builds a pseudo-target on the data manifold to update the noise sequence of the diffusion model with a universal direction, which is promising to perform query-efficient guided generation. Extensive experiments on twelve high-resolution ($\small {1024 \times 1024}$) image target generation tasks and six 3D-molecule target generation tasks show $\textbf{6}\times$ up to $\textbf{10}\times$ query efficiency improvement and $\textbf{11}\times$ up to $\textbf{44}\times$ query efficiency improvement, respectively. Our implementation is publicly available at: https://github.com/kimyong95/guide-stable-diffusion/tree/fast-direct
