Table of Contents
Fetching ...

Stylebreeder: Exploring and Democratizing Artistic Styles through Text-to-Image Models

Matthew Zheng, Enis Simsar, Hidir Yesiltepe, Federico Tombari, Joel Simon, Pinar Yanardag

TL;DR

STYLEBREEDER addresses how to map and democratize crowd-generated artistic styles in text-to-image diffusion by releasing a large Artbreeder-derived dataset (6.8M images, 1.8M prompts from 95K users) and implementing a style-centric pipeline for discovery, personalization, and recommendation. It couples style embeddings with clustering, multiple personalization methods (TI, LoRA, Custom Diffusion, EDLoRA), and a Style Atlas to distribute 100 LoRAs under CC0, enabling broad community access. The work demonstrates that extracted style representations enable targeted generation and style-based recommendations, revealing both unique crowd-driven aesthetics and practical personalization capabilities. The dataset, code, and models are made publicly available to foster reproducible research and open-ended exploration of digital creativity.

Abstract

Text-to-image models are becoming increasingly popular, revolutionizing the landscape of digital art creation by enabling highly detailed and creative visual content generation. These models have been widely employed across various domains, particularly in art generation, where they facilitate a broad spectrum of creative expression and democratize access to artistic creation. In this paper, we introduce \texttt{STYLEBREEDER}, a comprehensive dataset of 6.8M images and 1.8M prompts generated by 95K users on Artbreeder, a platform that has emerged as a significant hub for creative exploration with over 13M users. We introduce a series of tasks with this dataset aimed at identifying diverse artistic styles, generating personalized content, and recommending styles based on user interests. By documenting unique, user-generated styles that transcend conventional categories like 'cyberpunk' or 'Picasso,' we explore the potential for unique, crowd-sourced styles that could provide deep insights into the collective creative psyche of users worldwide. We also evaluate different personalization methods to enhance artistic expression and introduce a style atlas, making these models available in LoRA format for public use. Our research demonstrates the potential of text-to-image diffusion models to uncover and promote unique artistic expressions, further democratizing AI in art and fostering a more diverse and inclusive artistic community. The dataset, code and models are available at https://stylebreeder.github.io under a Public Domain (CC0) license.

Stylebreeder: Exploring and Democratizing Artistic Styles through Text-to-Image Models

TL;DR

STYLEBREEDER addresses how to map and democratize crowd-generated artistic styles in text-to-image diffusion by releasing a large Artbreeder-derived dataset (6.8M images, 1.8M prompts from 95K users) and implementing a style-centric pipeline for discovery, personalization, and recommendation. It couples style embeddings with clustering, multiple personalization methods (TI, LoRA, Custom Diffusion, EDLoRA), and a Style Atlas to distribute 100 LoRAs under CC0, enabling broad community access. The work demonstrates that extracted style representations enable targeted generation and style-based recommendations, revealing both unique crowd-driven aesthetics and practical personalization capabilities. The dataset, code, and models are made publicly available to foster reproducible research and open-ended exploration of digital creativity.

Abstract

Text-to-image models are becoming increasingly popular, revolutionizing the landscape of digital art creation by enabling highly detailed and creative visual content generation. These models have been widely employed across various domains, particularly in art generation, where they facilitate a broad spectrum of creative expression and democratize access to artistic creation. In this paper, we introduce \texttt{STYLEBREEDER}, a comprehensive dataset of 6.8M images and 1.8M prompts generated by 95K users on Artbreeder, a platform that has emerged as a significant hub for creative exploration with over 13M users. We introduce a series of tasks with this dataset aimed at identifying diverse artistic styles, generating personalized content, and recommending styles based on user interests. By documenting unique, user-generated styles that transcend conventional categories like 'cyberpunk' or 'Picasso,' we explore the potential for unique, crowd-sourced styles that could provide deep insights into the collective creative psyche of users worldwide. We also evaluate different personalization methods to enhance artistic expression and introduce a style atlas, making these models available in LoRA format for public use. Our research demonstrates the potential of text-to-image diffusion models to uncover and promote unique artistic expressions, further democratizing AI in art and fostering a more diverse and inclusive artistic community. The dataset, code and models are available at https://stylebreeder.github.io under a Public Domain (CC0) license.
Paper Structure (28 sections, 7 figures, 6 tables)

This paper contains 28 sections, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Our dataset comprises 6.8M images generated by 95,000 unique users, accompanied by 1.8M text prompts from July 2022 to May 2024. It includes detailed metadata such as Positive Prompt, Negative Prompt, UserID, Timestamp, and Image Size. Additionally, we supply model-related hyperparameters, including Model Type, Seed, Step, and CFG Scale. Note that the disparity in prompts and images arises because different images can be generated from the same text prompt when varying hyperparameters. We also offer further metadata like Cluster ID, along with scores for Prompt NSFW, Image NSFW, and Toxicity computed using state-of-the-art models DetoxifyNSFW-Detector.
  • Figure 2: Most unique users have fewer than 1000 images generated. The average number of words in a prompt is less than 60 words. Common keywords for positive prompts include 'painting', 'realistic', and 'digital' reveal semantic information about the style of desired images. Common keywords in negative prompts, such as 'ugly' and 'deformed,' indicate undesired features of generated images.
  • Figure 3: (a) Predicted NSFW scores across LAION schuhmann2022laion, Artbench liao2022artbench, DiffusionDB wang2022diffusiondb and TWIGMA chen2024twigma, STYLEBREEDER (Ours) on images, computed with NSFW-Detector (higher score indicates more NSFW content). (b) Predicted NSFW, Toxicity, Severe Toxicity, Identity Attack, Insult, and Threat scores across on text prompts, computed with Detoxify on STYLEBREEDER.
  • Figure 4: (a) User-generated images from 10 random clusters showcasing a diverse range of styles. (b) Sample images from style-based clustering vs. traditional clustering using DINO features show that style-based clustering captures the stylistic content while traditional clustering focuses on objects. (c) Visualization of the clusters, projected into 2D with t-SNE t-sne with each cluster represented by a unique color according to their assignments by K-Means++ kmeans++. This depiction highlights that while many styles are closely related, some distinct styles are noticeably distant from the main clusters.
  • Figure 5: (a) An illustration of our pipeline: we cluster input images by stylistic similarity and employ a personalization method, such as LoRA, to train personalized models aligned with specific styles. (b) Users can download style LoRA models from the Style Atlas platform. (c) Users can generate personalized images using LoRA models where Style S* represents an example image from the cluster. (d) We recommend top styles to users based on the images they have previously generated. This personalized approach helps tailor style suggestions to each user's unique preferences.
  • ...and 2 more figures