Table of Contents
Fetching ...

Covariance-Adaptive Sequential Black-box Optimization for Diffusion Targeted Generation

Yueming Lyu, Kim Yong Tan, Yew Soon Ong, Ivor W. Tsang

TL;DR

The paper tackles targeted generation with diffusion models when only black-box scores are available. It formulates targeted generation as a sequential black-box optimization over the reverse-time SDE and introduces a covariance-adaptive sequential black-box optimization (CASBO) method with full covariance updates, achieving a convergence rate of $O(\frac{d^2}{\sqrt{T}})$ for convex objectives. The authors derive a closed-form, KL-regularized update rule for the optimization variables and validate the approach on numerical benchmarks and a 3D-molecule design task, showing superior target-score performance. They also demonstrate applicability to CLIP-guided targeted image generation with Stable Diffusion, indicating broad cross-domain utility of the approach.

Abstract

Diffusion models have demonstrated great potential in generating high-quality content for images, natural language, protein domains, etc. However, how to perform user-preferred targeted generation via diffusion models with only black-box target scores of users remains challenging. To address this issue, we first formulate the fine-tuning of the targeted reserve-time stochastic differential equation (SDE) associated with a pre-trained diffusion model as a sequential black-box optimization problem. Furthermore, we propose a novel covariance-adaptive sequential optimization algorithm to optimize cumulative black-box scores under unknown transition dynamics. Theoretically, we prove a $O(\frac{d^2}{\sqrt{T}})$ convergence rate for cumulative convex functions without smooth and strongly convex assumptions. Empirically, experiments on both numerical test problems and target-guided 3D-molecule generation tasks show the superior performance of our method in achieving better target scores.

Covariance-Adaptive Sequential Black-box Optimization for Diffusion Targeted Generation

TL;DR

The paper tackles targeted generation with diffusion models when only black-box scores are available. It formulates targeted generation as a sequential black-box optimization over the reverse-time SDE and introduces a covariance-adaptive sequential black-box optimization (CASBO) method with full covariance updates, achieving a convergence rate of for convex objectives. The authors derive a closed-form, KL-regularized update rule for the optimization variables and validate the approach on numerical benchmarks and a 3D-molecule design task, showing superior target-score performance. They also demonstrate applicability to CLIP-guided targeted image generation with Stable Diffusion, indicating broad cross-domain utility of the approach.

Abstract

Diffusion models have demonstrated great potential in generating high-quality content for images, natural language, protein domains, etc. However, how to perform user-preferred targeted generation via diffusion models with only black-box target scores of users remains challenging. To address this issue, we first formulate the fine-tuning of the targeted reserve-time stochastic differential equation (SDE) associated with a pre-trained diffusion model as a sequential black-box optimization problem. Furthermore, we propose a novel covariance-adaptive sequential optimization algorithm to optimize cumulative black-box scores under unknown transition dynamics. Theoretically, we prove a convergence rate for cumulative convex functions without smooth and strongly convex assumptions. Empirically, experiments on both numerical test problems and target-guided 3D-molecule generation tasks show the superior performance of our method in achieving better target scores.
Paper Structure (21 sections, 8 theorems, 96 equations, 12 figures, 1 table, 2 algorithms)

This paper contains 21 sections, 8 theorems, 96 equations, 12 figures, 1 table, 2 algorithms.

Key Result

Theorem 4.4

Suppose the assumptions Assumption1Assumption2Assumption4 holds. Set $\beta_t = t\beta$ with $\beta>0$, $\alpha_t= \sqrt{t+1} \alpha$ with $\alpha > 0$, and $\gamma_t = \frac{\alpha\nu}{\beta\sqrt{t+1}}$, and $\nu>0$, and $\omega_t =1$. Initialize $\boldsymbol{\Sigma}_{k}^1$ such that $\| \boldsymb where $\Bar{\boldsymbol{\mu}}_k^t=[\boldsymbol{\mu}_1^{t\top},\cdots,\boldsymbol{\mu}_k^{t\top}]^\t

Figures (12)

  • Figure 1: Objective Values (Cumulative Target Score, Lower is Better) v.s. Number of Optimization Steps on Different Test Problems
  • Figure 2: Objective Values (Cumulative Vina Docking Score, Lower is Better) v.s. Number of Optimization Steps (Fine-tune Steps) for Different Receptors
  • Figure 3: Demonstration of the Generated 3D-molecule on Receptor-0
  • Figure 4: Demonstration of the Generated 3D-molecule on Receptor-1
  • Figure 5: Demonstration of the Generated 3D-molecule on Receptor-2
  • ...and 7 more figures

Theorems & Definitions (17)

  • Theorem 4.4
  • Lemma 8.1
  • proof
  • Lemma 8.2
  • proof
  • proof
  • proof
  • Lemma 8.3
  • proof
  • Lemma 8.4
  • ...and 7 more