Human Aesthetic Preference-Based Large Text-to-Image Model Personalization: Kandinsky Generation as an Example

Aven-Le Zhou; Yu-Ao Wang; Wei Wu; Kang Zhang

Human Aesthetic Preference-Based Large Text-to-Image Model Personalization: Kandinsky Generation as an Example

Aven-Le Zhou, Yu-Ao Wang, Wei Wu, Kang Zhang

TL;DR

This work tackles the inefficiency and non-determinism of prompting large text-to-image models by introducing a prompting-free personalization pipeline that combines semantic injection with a real-time human-in-the-loop genetic prompting optimization. An artist model is built by semantically injecting Kandinsky/Bauhaus attributes via fast LoRA and DiffLoRA, then paired with a genetic algorithm that evolves prompts based on user feedback to yield personalized outputs without explicit prompts. The authors create a Kandinsky-focused dataset and demonstrate two experiments: first establishing an artist-tuned diffusion setup, then enabling users to converge on a preferred prompting strategy within a handful of iterations. The approach aims to democratize access to personalized, stylistically consistent image generation, and the authors provide open-source data and code to support further research and reuse.

Abstract

With the advancement of neural generative capabilities, the art community has actively embraced GenAI (generative artificial intelligence) for creating painterly content. Large text-to-image models can quickly generate aesthetically pleasing outcomes. However, the process can be non-deterministic and often involves tedious trial-and-error, as users struggle with formulating effective prompts to achieve their desired results. This paper introduces a prompting-free generative approach that empowers users to automatically generate personalized painterly content that incorporates their aesthetic preferences in a customized artistic style. This approach involves utilizing ``semantic injection'' to customize an artist model in a specific artistic style, and further leveraging a genetic algorithm to optimize the prompt generation process through real-time iterative human feedback. By solely relying on the user's aesthetic evaluation and preference for the artist model-generated images, this approach creates the user a personalized model that encompasses their aesthetic preferences and the customized artistic style.

Human Aesthetic Preference-Based Large Text-to-Image Model Personalization: Kandinsky Generation as an Example

TL;DR

Abstract

Paper Structure (25 sections, 10 figures)

This paper contains 25 sections, 10 figures.

Introduction
Prompting in Text-to-Image Generation
Controlling Non-determinism
Customizing Large Text-to-Image Model
Our Prompting-Free Approach
Related Work
Text-to-Image Models Customization
Human Preference and Feedback
Method
Semantic Injection
Genetic Prompting Optimization
Kandinsky Bauhaus Style
Kandinsky Semantic Descriptive Guideline
Kandinsky Bauhaus Style Paintings
Experiment I: Artist Text-to-Image Model
...and 10 more sections

Figures (10)

Figure 1: Artist Model: Large Text-to-Image Model Customization with Semantic Injection.
Figure 2: Prompting Model + Artist Model: Genetic Prompting Optimization with Human Feedback in Five Iterations.
Figure 3: Genetic Prompting Optimization with Human Feedback.
Figure 4: Semantic Descriptive Guideline: Attribute-Values.
Figure 5: Kandinsky Text-to-Image Dataset.
...and 5 more figures

Human Aesthetic Preference-Based Large Text-to-Image Model Personalization: Kandinsky Generation as an Example

TL;DR

Abstract

Human Aesthetic Preference-Based Large Text-to-Image Model Personalization: Kandinsky Generation as an Example

Authors

TL;DR

Abstract

Table of Contents

Figures (10)