Table of Contents
Fetching ...

Evolve to Inspire: Novelty Search for Diverse Image Generation

Alex Inch, Passawis Chaiyapattanaporn, Yuchen Zhu, Yuan Lu, Ting-Wen Ko, Davide Paglieri

TL;DR

Wander reframes image generation as novelty-driven prompt evolution, using an LLM as the mutation engine and CLIP-based embeddings to quantify image novelty. Emitters—human-designed mutation strategies—steer the search into diverse regions of the prompt space, while a fixed-size pool and a bandit-like selection mechanism maintain token efficiency. Empirical results show Wander achieves higher diversity (via the Vendi score) and comparable relevance with fewer tokens than baselines, and show that more capable LLMs further boost diversity at the cost of higher token usage. This approach enables open-ended creative exploration with diffusion models in a model-agnostic, scalable way, with potential extensions to other modalities and applications such as data augmentation.

Abstract

Text-to-image diffusion models, while proficient at generating high-fidelity images, often suffer from limited output diversity, hindering their application in exploratory and ideation tasks. Existing prompt optimization techniques typically target aesthetic fitness or are ill-suited to the creative visual domain. To address this shortcoming, we introduce WANDER, a novelty search-based approach to generating diverse sets of images from a single input prompt. WANDER operates directly on natural language prompts, employing a Large Language Model (LLM) for semantic evolution of diverse sets of images, and using CLIP embeddings to quantify novelty. We additionally apply emitters to guide the search into distinct regions of the prompt space, and demonstrate that they boost the diversity of the generated images. Empirical evaluations using FLUX-DEV for generation and GPT-4o-mini for mutation demonstrate that WANDER significantly outperforms existing evolutionary prompt optimization baselines in diversity metrics. Ablation studies confirm the efficacy of emitters.

Evolve to Inspire: Novelty Search for Diverse Image Generation

TL;DR

Wander reframes image generation as novelty-driven prompt evolution, using an LLM as the mutation engine and CLIP-based embeddings to quantify image novelty. Emitters—human-designed mutation strategies—steer the search into diverse regions of the prompt space, while a fixed-size pool and a bandit-like selection mechanism maintain token efficiency. Empirical results show Wander achieves higher diversity (via the Vendi score) and comparable relevance with fewer tokens than baselines, and show that more capable LLMs further boost diversity at the cost of higher token usage. This approach enables open-ended creative exploration with diffusion models in a model-agnostic, scalable way, with potential extensions to other modalities and applications such as data augmentation.

Abstract

Text-to-image diffusion models, while proficient at generating high-fidelity images, often suffer from limited output diversity, hindering their application in exploratory and ideation tasks. Existing prompt optimization techniques typically target aesthetic fitness or are ill-suited to the creative visual domain. To address this shortcoming, we introduce WANDER, a novelty search-based approach to generating diverse sets of images from a single input prompt. WANDER operates directly on natural language prompts, employing a Large Language Model (LLM) for semantic evolution of diverse sets of images, and using CLIP embeddings to quantify novelty. We additionally apply emitters to guide the search into distinct regions of the prompt space, and demonstrate that they boost the diversity of the generated images. Empirical evaluations using FLUX-DEV for generation and GPT-4o-mini for mutation demonstrate that WANDER significantly outperforms existing evolutionary prompt optimization baselines in diversity metrics. Ablation studies confirm the efficacy of emitters.

Paper Structure

This paper contains 31 sections, 3 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Our method generates significantly more diverse images than reusing a prompt multiple times.
  • Figure 2: An overview of the Wander workflow.
  • Figure 3: Examples of LLM prompt mutation and crossover
  • Figure 4: Ablation over emitter selection strategies. The results presented are averaged over 10 runs for each of 10 prompts (n=100 samples per method). For comparability, the Vendi score was min-max normalized per prompt.
  • Figure 5: Over longer runs, the Vendi score consistently rises, plateauing around the $100^{\text{th}}$ generation. Averaged over 10 runs, shaded area indicates the standard error.
  • ...and 5 more figures