Table of Contents
Fetching ...

Exploring Language Patterns of Prompts in Text-to-Image Generation and Their Impact on Visual Diversity

Maria-Teresa De Rosa Palmini, Eva Cetinic

TL;DR

This work analyzes over 6 million prompts from the Civiverse dataset to understand how user language in text-to-image prompts shapes visual outputs. By combining lexical metrics (TTR, SRS, CR, ENW), near-duplicate detection (MinHash), semantic topic modeling (MiniLM, UMAP, HDBSCAN, c-TF-IDF, GPT-4o labels), and visual diversity measures (Vendi Score, CLIP embeddings), the study reveals a clear trend toward lexical homogenization and stable semantic themes as user participation grows. It shows that higher lexical/semantic similarity among prompts correlates with more visually uniform images, highlighting a feedback loop between community norms and model-influenced outputs. The findings underscore the need for design and tooling that promote linguistic and thematic experimentation to enhance diversity and reduce biases in AI-generated imagery, with implications for fairness, creativity, and cultural representation.

Abstract

Following the initial excitement, Text-to-Image (TTI) models are now being examined more critically. While much of the discourse has focused on biases and stereotypes embedded in large-scale training datasets, the sociotechnical dynamics of user interactions with these models remain underexplored. This study examines the linguistic and semantic choices users make when crafting prompts and how these choices influence the diversity of generated outputs. Analyzing over six million prompts from the Civiverse dataset on the CivitAI platform across seven months, we categorize users into three groups based on their levels of linguistic experimentation: consistent repeaters, occasional repeaters, and non-repeaters. Our findings reveal that as user participation grows over time, prompt language becomes increasingly homogenized through the adoption of popular community tags and descriptors, with repeated prompts comprising 40-50% of submissions. At the same time, semantic similarity and topic preferences remain relatively stable, emphasizing common subjects and surface aesthetics. Using Vendi scores to quantify visual diversity, we demonstrate a clear correlation between lexical similarity in prompts and the visual similarity of generated images, showing that linguistic repetition reinforces less diverse representations. These findings highlight the significant role of user-driven factors in shaping AI-generated imagery, beyond inherent model biases, and underscore the need for tools and practices that encourage greater linguistic and thematic experimentation within TTI systems to foster more inclusive and diverse AI-generated content.

Exploring Language Patterns of Prompts in Text-to-Image Generation and Their Impact on Visual Diversity

TL;DR

This work analyzes over 6 million prompts from the Civiverse dataset to understand how user language in text-to-image prompts shapes visual outputs. By combining lexical metrics (TTR, SRS, CR, ENW), near-duplicate detection (MinHash), semantic topic modeling (MiniLM, UMAP, HDBSCAN, c-TF-IDF, GPT-4o labels), and visual diversity measures (Vendi Score, CLIP embeddings), the study reveals a clear trend toward lexical homogenization and stable semantic themes as user participation grows. It shows that higher lexical/semantic similarity among prompts correlates with more visually uniform images, highlighting a feedback loop between community norms and model-influenced outputs. The findings underscore the need for design and tooling that promote linguistic and thematic experimentation to enhance diversity and reduce biases in AI-generated imagery, with implications for fairness, creativity, and cultural representation.

Abstract

Following the initial excitement, Text-to-Image (TTI) models are now being examined more critically. While much of the discourse has focused on biases and stereotypes embedded in large-scale training datasets, the sociotechnical dynamics of user interactions with these models remain underexplored. This study examines the linguistic and semantic choices users make when crafting prompts and how these choices influence the diversity of generated outputs. Analyzing over six million prompts from the Civiverse dataset on the CivitAI platform across seven months, we categorize users into three groups based on their levels of linguistic experimentation: consistent repeaters, occasional repeaters, and non-repeaters. Our findings reveal that as user participation grows over time, prompt language becomes increasingly homogenized through the adoption of popular community tags and descriptors, with repeated prompts comprising 40-50% of submissions. At the same time, semantic similarity and topic preferences remain relatively stable, emphasizing common subjects and surface aesthetics. Using Vendi scores to quantify visual diversity, we demonstrate a clear correlation between lexical similarity in prompts and the visual similarity of generated images, showing that linguistic repetition reinforces less diverse representations. These findings highlight the significant role of user-driven factors in shaping AI-generated imagery, beyond inherent model biases, and underscore the need for tools and practices that encourage greater linguistic and thematic experimentation within TTI systems to foster more inclusive and diverse AI-generated content.

Paper Structure

This paper contains 40 sections, 18 figures, 31 tables.

Figures (18)

  • Figure 1: Images from the Civiverse dataset, generated with the Animagine XL v3.1 model, using variations of a common prompt pattern: 1 girl, 30 years, looking at viewer, leaning forward, head turned, glossy lips, soft skin, elegant makeup, seductive smile, masterpiece, best quality. The first row shows images from prompts with 20 identical tokens, while the second row uses prompts with 9 identical tokens.
  • Figure 2: Monthly distribution of new, abandoning, and retained users (left) and monthly distribution of users based on their categorization into consistent, occasional or non-repeaters of prompts (right).
  • Figure 3: Monthly changes of the values of TTR, ENW, SRS, and CR scores calculated from the prompts of Civiverse dataset.
  • Figure 4: Semantic Trends Over Time: (a) Total Prompts, (b) Topics, and (c) Topics-to-Prompts Ratio.
  • Figure 5: UMAP visualization of MiniLM-L6-v2 embeddings of prompt specifiers and HDBSCAN-identified topics for the Consistent Repeaters user category
  • ...and 13 more figures