Table of Contents
Fetching ...

Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Models

Deepak Sridhar, Nuno Vasconcelos

TL;DR

This paper proposes a straightforward textual inversion method to learn concepts through text embeddings, which are generalizable across models that share the same text encoder, including different versions of the SD model.

Abstract

Diffusion models have recently surpassed GANs in image synthesis and editing, offering superior image quality and diversity. However, achieving precise control over attributes in generated images remains a challenge. Concept Sliders introduced a method for fine-grained image control and editing by learning concepts (attributes/objects). However, this approach adds parameters and increases inference time due to the loading and unloading of Low-Rank Adapters (LoRAs) used for learning concepts. These adapters are model-specific and require retraining for different architectures, such as Stable Diffusion (SD) v1.5 and SD-XL. In this paper, we propose a straightforward textual inversion method to learn concepts through text embeddings, which are generalizable across models that share the same text encoder, including different versions of the SD model. We refer to our method as Prompt Sliders. Besides learning new concepts, we also show that Prompt Sliders can be used to erase undesirable concepts such as artistic styles or mature content. Our method is 30% faster than using LoRAs because it eliminates the need to load and unload adapters and introduces no additional parameters aside from the target concept text embedding. Each concept embedding only requires 3KB of storage compared to the 8922KB or more required for each LoRA adapter, making our approach more computationally efficient. Project Page: https://deepaksridhar.github.io/promptsliders.github.io/

Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Models

TL;DR

This paper proposes a straightforward textual inversion method to learn concepts through text embeddings, which are generalizable across models that share the same text encoder, including different versions of the SD model.

Abstract

Diffusion models have recently surpassed GANs in image synthesis and editing, offering superior image quality and diversity. However, achieving precise control over attributes in generated images remains a challenge. Concept Sliders introduced a method for fine-grained image control and editing by learning concepts (attributes/objects). However, this approach adds parameters and increases inference time due to the loading and unloading of Low-Rank Adapters (LoRAs) used for learning concepts. These adapters are model-specific and require retraining for different architectures, such as Stable Diffusion (SD) v1.5 and SD-XL. In this paper, we propose a straightforward textual inversion method to learn concepts through text embeddings, which are generalizable across models that share the same text encoder, including different versions of the SD model. We refer to our method as Prompt Sliders. Besides learning new concepts, we also show that Prompt Sliders can be used to erase undesirable concepts such as artistic styles or mature content. Our method is 30% faster than using LoRAs because it eliminates the need to load and unload adapters and introduces no additional parameters aside from the target concept text embedding. Each concept embedding only requires 3KB of storage compared to the 8922KB or more required for each LoRA adapter, making our approach more computationally efficient. Project Page: https://deepaksridhar.github.io/promptsliders.github.io/
Paper Structure (21 sections, 7 equations, 11 figures, 1 table)

This paper contains 21 sections, 7 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: Prompt Sliders for fine-grained control of attributes with textual inversion. Each row in the figure shows a corresponding concept depicted on top of the image along with its control strength $\alpha$ that enhances the target concept as the guidance strength increases. The prompts for the images from top to bottom are listed as follows (in order) - "A portrait of an woman with a warm smile", "A woman with voluminous hair cascading over her shoulders, posing for a fashion shoot", "A fantasy character", "A person caught off guard by unexpected news".
  • Figure 1: Comparison of CLIP-score and inference times for LoRA based sliders against the proposed prompt sliders.
  • Figure 2: Prompt sliders for encoding abstract concepts such as cartoon, clay, pixar and sculpture styles. The prompts used for generating the images are as follows (left to right in order) - "A superhero character in action, with bold lines and bright colors", "An artist working", "A funny and charming robot exploring a futuristic city", "A famous historical figure".
  • Figure 3: Left: Training of textual Prompt Sliders. Right: Training of visual Prompt Sliders.
  • Figure 4: More qualitative results of Prompt Sliders depicting various concepts. The corresponding prompts for the images in the figure from left to right in the top row, and from left to right in the bottom row are as follows. "A kitchen", "A building in a city", "A vibrant coral reef teeming with marine life, seen through crystal-clear water", "A chef in a kitchen, skillfully preparing a gourmet dish".
  • ...and 6 more figures