Table of Contents
Fetching ...

SPARKE: Scalable Prompt-Aware Diversity and Novelty Guidance in Diffusion Models via RKE Score

Mohammad Jalali, Haoyu Lei, Amin Gohari, Farzan Farnia

TL;DR

SPARKE tackles the challenge of achieving high-quality, diverse outputs in prompt-guided diffusion models by introducing prompt-aware diversity guidance based on Conditional Rényi Kernel Entropy (Conditional-RKE). By operating in the latent space of Latent Diffusion Models and specializing to the order-2 Renyi entropy, SPARKE reduces the computational burden from $O(n^3)$ to $O(n)$-level gradient computation, enabling scalable sampling across thousands of generations conditioned on semantically similar prompts. The framework includes both unconditional (IRKE) and conditional (Cond-IRKE) diversity objectives, with Hadamard-product kernel conditioning that weights diversity by prompt similarity, and demonstrates improved prompt-aware diversity while preserving fidelity across state-of-the-art models like Stable Diffusion v2.1, SDXL, and PixArt-$\Sigma$. The results indicate that latent-space, kernel-entropy guidance can deliver more varied, region- and prompt-sensitive outputs at a fraction of the computational cost of prior approaches, with potential extensions to other modalities such as video. Key contributions include: (i) introducing Conditional-RKE as a scalable prompt-aware diversity score; (ii) deriving efficient gradient forms for IRKE and Cond-IRKE in latent space; (iii) integrating prompt-conditioned diversity into diffusion sampling with substantial efficiency gains; and (iv) empirical validation showing improved diversity metrics and maintained fidelity on multiple diffusion-model backbones, along with novelty guidance capabilities using a reference dataset.

Abstract

Diffusion models have demonstrated remarkable success in high-fidelity image synthesis and prompt-guided generative modeling. However, ensuring adequate diversity in generated samples of prompt-guided diffusion models remains a challenge, particularly when the prompts span a broad semantic spectrum and the diversity of generated data needs to be evaluated in a prompt-aware fashion across semantically similar prompts. Recent methods have introduced guidance via diversity measures to encourage more varied generations. In this work, we extend the diversity measure-based approaches by proposing the Scalable Prompt-Aware Rény Kernel Entropy Diversity Guidance (SPARKE) method for prompt-aware diversity guidance. SPARKE utilizes conditional entropy for diversity guidance, which dynamically conditions diversity measurement on similar prompts and enables prompt-aware diversity control. While the entropy-based guidance approach enhances prompt-aware diversity, its reliance on the matrix-based entropy scores poses computational challenges in large-scale generation settings. To address this, we focus on the special case of Conditional latent RKE Score Guidance, reducing entropy computation and gradient-based optimization complexity from the $O(n^3)$ of general entropy measures to $O(n)$. The reduced computational complexity allows for diversity-guided sampling over potentially thousands of generation rounds on different prompts. We numerically test the SPARKE method on several text-to-image diffusion models, demonstrating that the proposed method improves the prompt-aware diversity of the generated data without incurring significant computational costs. We release our code on the project page: https://mjalali.github.io/SPARKE

SPARKE: Scalable Prompt-Aware Diversity and Novelty Guidance in Diffusion Models via RKE Score

TL;DR

SPARKE tackles the challenge of achieving high-quality, diverse outputs in prompt-guided diffusion models by introducing prompt-aware diversity guidance based on Conditional Rényi Kernel Entropy (Conditional-RKE). By operating in the latent space of Latent Diffusion Models and specializing to the order-2 Renyi entropy, SPARKE reduces the computational burden from to -level gradient computation, enabling scalable sampling across thousands of generations conditioned on semantically similar prompts. The framework includes both unconditional (IRKE) and conditional (Cond-IRKE) diversity objectives, with Hadamard-product kernel conditioning that weights diversity by prompt similarity, and demonstrates improved prompt-aware diversity while preserving fidelity across state-of-the-art models like Stable Diffusion v2.1, SDXL, and PixArt-. The results indicate that latent-space, kernel-entropy guidance can deliver more varied, region- and prompt-sensitive outputs at a fraction of the computational cost of prior approaches, with potential extensions to other modalities such as video. Key contributions include: (i) introducing Conditional-RKE as a scalable prompt-aware diversity score; (ii) deriving efficient gradient forms for IRKE and Cond-IRKE in latent space; (iii) integrating prompt-conditioned diversity into diffusion sampling with substantial efficiency gains; and (iv) empirical validation showing improved diversity metrics and maintained fidelity on multiple diffusion-model backbones, along with novelty guidance capabilities using a reference dataset.

Abstract

Diffusion models have demonstrated remarkable success in high-fidelity image synthesis and prompt-guided generative modeling. However, ensuring adequate diversity in generated samples of prompt-guided diffusion models remains a challenge, particularly when the prompts span a broad semantic spectrum and the diversity of generated data needs to be evaluated in a prompt-aware fashion across semantically similar prompts. Recent methods have introduced guidance via diversity measures to encourage more varied generations. In this work, we extend the diversity measure-based approaches by proposing the Scalable Prompt-Aware Rény Kernel Entropy Diversity Guidance (SPARKE) method for prompt-aware diversity guidance. SPARKE utilizes conditional entropy for diversity guidance, which dynamically conditions diversity measurement on similar prompts and enables prompt-aware diversity control. While the entropy-based guidance approach enhances prompt-aware diversity, its reliance on the matrix-based entropy scores poses computational challenges in large-scale generation settings. To address this, we focus on the special case of Conditional latent RKE Score Guidance, reducing entropy computation and gradient-based optimization complexity from the of general entropy measures to . The reduced computational complexity allows for diversity-guided sampling over potentially thousands of generation rounds on different prompts. We numerically test the SPARKE method on several text-to-image diffusion models, demonstrating that the proposed method improves the prompt-aware diversity of the generated data without incurring significant computational costs. We release our code on the project page: https://mjalali.github.io/SPARKE

Paper Structure

This paper contains 22 sections, 2 theorems, 51 equations, 19 figures, 5 tables, 1 algorithm.

Key Result

Proposition 1

Let $Z = \{\boldsymbol{z}^{(1)}, \boldsymbol{z}^{(2)}, \ldots, \boldsymbol{z}^{(n)}\}$ denote a set of $n$ generated data. Let kernel function $k:\mathcal{Z}\times \mathcal{Z}\rightarrow \mathbb{R}$ be symmetric and normalized, i.e. $k(\boldsymbol{z},\boldsymbol{z})=1$ for every $\boldsymbol{z}\in\m

Figures (19)

  • Figure 1: Overview of the proposed SPARKE method in generating images at different iterations in comparison to the vanilla Stable Diffusion-XL podell2024sdxl model. We also compare the conditional-RKE guidance in SPARKE with the baseline Vendi Score guidance askari2024improving (unconditional, in latent space).
  • Figure 2: Comparison of latent entropy-based diversity guidance (ours) vs. ambient entropy diversity guidance in Latent Diffusion Models (LDMs). The experiment is performed with the SD-XL LDM.
  • Figure 3: Qualitative Comparison of samples generated by the base latent diffusion model (LDM), PixArt-$\Sigma$, and Stable Diffusion XL, vs. LDM guided via our proposed SPARKE guidance.
  • Figure 4: Comparison of SPARKE (Conditional RKE) Guidance with baselines on 2D GMMs.
  • Figure 5: Comparison of SPARKE prompt-aware diversity guidance via conditional-RKE score vs. diversity-unaware diversity guidance using the RKE score on SD 2.1 text-to-image generation.
  • ...and 14 more figures

Theorems & Definitions (4)

  • Proposition 1
  • Proposition 2
  • proof
  • proof