Covariance-Adaptive Sequential Black-box Optimization for Diffusion Targeted Generation
Yueming Lyu, Kim Yong Tan, Yew Soon Ong, Ivor W. Tsang
TL;DR
The paper tackles targeted generation with diffusion models when only black-box scores are available. It formulates targeted generation as a sequential black-box optimization over the reverse-time SDE and introduces a covariance-adaptive sequential black-box optimization (CASBO) method with full covariance updates, achieving a convergence rate of $O(\frac{d^2}{\sqrt{T}})$ for convex objectives. The authors derive a closed-form, KL-regularized update rule for the optimization variables and validate the approach on numerical benchmarks and a 3D-molecule design task, showing superior target-score performance. They also demonstrate applicability to CLIP-guided targeted image generation with Stable Diffusion, indicating broad cross-domain utility of the approach.
Abstract
Diffusion models have demonstrated great potential in generating high-quality content for images, natural language, protein domains, etc. However, how to perform user-preferred targeted generation via diffusion models with only black-box target scores of users remains challenging. To address this issue, we first formulate the fine-tuning of the targeted reserve-time stochastic differential equation (SDE) associated with a pre-trained diffusion model as a sequential black-box optimization problem. Furthermore, we propose a novel covariance-adaptive sequential optimization algorithm to optimize cumulative black-box scores under unknown transition dynamics. Theoretically, we prove a $O(\frac{d^2}{\sqrt{T}})$ convergence rate for cumulative convex functions without smooth and strongly convex assumptions. Empirically, experiments on both numerical test problems and target-guided 3D-molecule generation tasks show the superior performance of our method in achieving better target scores.
