Table of Contents
Fetching ...

ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement

Muhammad Atif Butt, Kai Wang, Javier Vazquez-Corral, Joost van de Weijer

TL;DR

ColorPeel tackles the problem of precise color prompt control in diffusion-based T2I generation by disentangling color from shape. It learns target-color prompts from sets of basic geometric objects in the desired color and introduces a cross-attention alignment loss to enforce consistent color and shape localization, enabling accurate color generation and transfer. The method generalizes to textures and materials and supports image editing and color interpolation. Empirical results show superior color fidelity and transferability compared with existing personalization baselines, with supportive human studies confirming perceptual improvements. This work advances practical color control in diffusion models, with broad potential for design and creative applications.

Abstract

Text-to-Image (T2I) generation has made significant advancements with the advent of diffusion models. These models exhibit remarkable abilities to produce images based on textual prompts. Current T2I models allow users to specify object colors using linguistic color names. However, these labels encompass broad color ranges, making it difficult to achieve precise color matching. To tackle this challenging task, named color prompt learning, we propose to learn specific color prompts tailored to user-selected colors. Existing T2I personalization methods tend to result in color-shape entanglement. To overcome this, we generate several basic geometric objects in the target color, allowing for color and shape disentanglement during the color prompt learning. Our method, denoted as ColorPeel, successfully assists the T2I models to peel off the novel color prompts from these colored shapes. In the experiments, we demonstrate the efficacy of ColorPeel in achieving precise color generation with T2I models. Furthermore, we generalize ColorPeel to effectively learn abstract attribute concepts, including textures, materials, etc. Our findings represent a significant step towards improving precision and versatility of T2I models, offering new opportunities for creative applications and design tasks. Our project is available at https://moatifbutt.github.io/colorpeel/.

ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement

TL;DR

ColorPeel tackles the problem of precise color prompt control in diffusion-based T2I generation by disentangling color from shape. It learns target-color prompts from sets of basic geometric objects in the desired color and introduces a cross-attention alignment loss to enforce consistent color and shape localization, enabling accurate color generation and transfer. The method generalizes to textures and materials and supports image editing and color interpolation. Empirical results show superior color fidelity and transferability compared with existing personalization baselines, with supportive human studies confirming perceptual improvements. This work advances practical color control in diffusion models, with broad potential for design and creative applications.

Abstract

Text-to-Image (T2I) generation has made significant advancements with the advent of diffusion models. These models exhibit remarkable abilities to produce images based on textual prompts. Current T2I models allow users to specify object colors using linguistic color names. However, these labels encompass broad color ranges, making it difficult to achieve precise color matching. To tackle this challenging task, named color prompt learning, we propose to learn specific color prompts tailored to user-selected colors. Existing T2I personalization methods tend to result in color-shape entanglement. To overcome this, we generate several basic geometric objects in the target color, allowing for color and shape disentanglement during the color prompt learning. Our method, denoted as ColorPeel, successfully assists the T2I models to peel off the novel color prompts from these colored shapes. In the experiments, we demonstrate the efficacy of ColorPeel in achieving precise color generation with T2I models. Furthermore, we generalize ColorPeel to effectively learn abstract attribute concepts, including textures, materials, etc. Our findings represent a significant step towards improving precision and versatility of T2I models, offering new opportunities for creative applications and design tasks. Our project is available at https://moatifbutt.github.io/colorpeel/.
Paper Structure (24 sections, 5 equations, 23 figures, 3 tables)

This paper contains 24 sections, 5 equations, 23 figures, 3 tables.

Figures (23)

  • Figure 1: Overview of our ColorPeel for personalized color prompt learning. Given the RGB triplets or color coordinates, ColorPeel generates basic 2D or 3D geometries with target colors for color learning. This facilitates the disentanglement of color and shape concepts, allowing for precise color usage in image generation.
  • Figure 2: Analyzing Color Fidelity and Transferability. Given RGB values (of blue color) in the text prompt, Stable Diffusion fails to generate desired objects in specified colors and also lacks consistency in color fidelity when provided with specific color names. Comparatively, seminal new concept learning methods Textual Inversion and Dreambooth generate text-guided objects in specified colors; however, these are single concept learning baselines and also fail to generate consistent colored objects. Custom Diffusion --- multi-concept learning baseline, inter-mixes the colors while also reducing the sample variation, which leads to unintended outcomes.
  • Figure 3: Illustration of our method ColorPeel. Firstly, instance images along with the templates are generated, given the user-provided RGB or color coordinates. Next, we introduce new modifier tokens, i.e., $\mathit{s}^{*}_i$ and $\mathit{c}^{*}_i$ which correspond to shapes and colors to ensure the disentanglement of shape from color. Following Custom Diffusion, the key and value projection matrices in the diffusion model cross-attention layers are optimized along with the modifier tokens while training. To improve learning, we introduce cross attention alignment to enforce the color and shape cross-attentions.
  • Figure 4: Cross attention visualization. We compare the cross attention maps from the last timestep of Custom Diffusion and ColorPeel. Our method precisely learns color from the given concept while distinctively avoiding the overlapping with background, which is one of the main reasons for color inter-mixing in the baseline.
  • Figure 5: Qualitative results of ColorPeel in single color and multi-color compositions.
  • ...and 18 more figures