Table of Contents
Fetching ...

Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting

Zijie Chen, Lichao Zhang, Fangsheng Weng, Lili Pan, Zhenzhong Lan

TL;DR

The paper tackles personalizing text-to-image generation by exploiting historical user interactions to rewrite prompts. It introduces the Personalized Image-Prompt (PIP) dataset with over 300k prompts from 3,115 users and a Personalized Prompt Rewriting (Personalized PR) pipeline that retrieves relevant history, rewrites prompts, and generates images with a T2I model. Offline and online evaluations show the approach outperforms baseline prompt rewriting methods and ablations reveal that dense retrieval with one-shot in-context learning yields the best results. This work provides a public dataset, a standardized evaluation framework, and a practical pathway toward user-aligned visual generation with broad implications for personalized AI content creation.

Abstract

Despite significant progress in the field, it is still challenging to create personalized visual representations that align closely with the desires and preferences of individual users. This process requires users to articulate their ideas in words that are both comprehensible to the models and accurately capture their vision, posing difficulties for many users. In this paper, we tackle this challenge by leveraging historical user interactions with the system to enhance user prompts. We propose a novel approach that involves rewriting user prompts based on a newly collected large-scale text-to-image dataset with over 300k prompts from 3115 users. Our rewriting model enhances the expressiveness and alignment of user prompts with their intended visual outputs. Experimental results demonstrate the superiority of our methods over baseline approaches, as evidenced in our new offline evaluation method and online tests. Our code and dataset are available at https://github.com/zzjchen/Tailored-Visions.

Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting

TL;DR

The paper tackles personalizing text-to-image generation by exploiting historical user interactions to rewrite prompts. It introduces the Personalized Image-Prompt (PIP) dataset with over 300k prompts from 3,115 users and a Personalized Prompt Rewriting (Personalized PR) pipeline that retrieves relevant history, rewrites prompts, and generates images with a T2I model. Offline and online evaluations show the approach outperforms baseline prompt rewriting methods and ablations reveal that dense retrieval with one-shot in-context learning yields the best results. This work provides a public dataset, a standardized evaluation framework, and a practical pathway toward user-aligned visual generation with broad implications for personalized AI content creation.

Abstract

Despite significant progress in the field, it is still challenging to create personalized visual representations that align closely with the desires and preferences of individual users. This process requires users to articulate their ideas in words that are both comprehensible to the models and accurately capture their vision, posing difficulties for many users. In this paper, we tackle this challenge by leveraging historical user interactions with the system to enhance user prompts. We propose a novel approach that involves rewriting user prompts based on a newly collected large-scale text-to-image dataset with over 300k prompts from 3115 users. Our rewriting model enhances the expressiveness and alignment of user prompts with their intended visual outputs. Experimental results demonstrate the superiority of our methods over baseline approaches, as evidenced in our new offline evaluation method and online tests. Our code and dataset are available at https://github.com/zzjchen/Tailored-Visions.
Paper Structure (22 sections, 1 equation, 14 figures, 8 tables)

This paper contains 22 sections, 1 equation, 14 figures, 8 tables.

Figures (14)

  • Figure 1: Comparison between our personalized prompt rewriting method and the standard prompt rewriting method. Our technique excels at incorporating user preferences, such as "oil paintings by artists," while methods that lack a historical context frequently generate content that may not align with the user's desires.
  • Figure 2: Dataset creation process. We split our dataset into training and testing sets and summarize each prompts in the test set using ChatGPT.
  • Figure 3: Dataset statistics and distribution. Left: Proportion of users based on the varying number of historical prompts they have. Note that each user has a minimum of 18 historical prompts, as we have excluded those with fewer prompts from the dataset. Right: Proportion of prompts based on their varying lengths. Best view in color.
  • Figure 4: Two examples from a user history, containing Image, Prompt, User ID, Image size and URL.
  • Figure 5: Word cloud visualization of top 250 keywords sampled from the PIP dataset.
  • ...and 9 more figures