Table of Contents
Fetching ...

Color Conditional Generation with Sliced Wasserstein Guidance

Alexander Lobashev, Maria Larchenko, Dmitry Guskov

TL;DR

This work tackles the challenge of color-conditioned image generation by enabling precise color control through a training-free diffusion-guidance mechanism. It introduces SW-Guidance, which optimizes a differentiable $SW_1$ distance between the generated image's color distribution and a reference palette during sampling, thereby aligning colors while preserving the textual prompt's semantics. The approach yields state-of-the-art color similarity to reference palettes on SD1.5 and SDXL without transferring unwanted textures, and remains compatible with additional conditioning like ControlNets. Theoretical guarantees connect SW convergence to moment convergence, while extensive ablations and qualitative results validate effectiveness and robustness across architectures and datasets. The work is impactful for color-accurate image synthesis in creative and design tasks, offering a practical, training-free path to palette-guided generation.

Abstract

We propose SW-Guidance, a training-free approach for image generation conditioned on the color distribution of a reference image. While it is possible to generate an image with fixed colors by first creating an image from a text prompt and then applying a color style transfer method, this approach often results in semantically meaningless colors in the generated image. Our method solves this problem by modifying the sampling process of a diffusion model to incorporate the differentiable Sliced 1-Wasserstein distance between the color distribution of the generated image and the reference palette. Our method outperforms state-of-the-art techniques for color-conditional generation in terms of color similarity to the reference, producing images that not only match the reference colors but also maintain semantic coherence with the original text prompt. Our source code is available at https://github.com/alobashev/sw-guidance/.

Color Conditional Generation with Sliced Wasserstein Guidance

TL;DR

This work tackles the challenge of color-conditioned image generation by enabling precise color control through a training-free diffusion-guidance mechanism. It introduces SW-Guidance, which optimizes a differentiable distance between the generated image's color distribution and a reference palette during sampling, thereby aligning colors while preserving the textual prompt's semantics. The approach yields state-of-the-art color similarity to reference palettes on SD1.5 and SDXL without transferring unwanted textures, and remains compatible with additional conditioning like ControlNets. Theoretical guarantees connect SW convergence to moment convergence, while extensive ablations and qualitative results validate effectiveness and robustness across architectures and datasets. The work is impactful for color-accurate image synthesis in creative and design tasks, offering a practical, training-free path to palette-guided generation.

Abstract

We propose SW-Guidance, a training-free approach for image generation conditioned on the color distribution of a reference image. While it is possible to generate an image with fixed colors by first creating an image from a text prompt and then applying a color style transfer method, this approach often results in semantically meaningless colors in the generated image. Our method solves this problem by modifying the sampling process of a diffusion model to incorporate the differentiable Sliced 1-Wasserstein distance between the color distribution of the generated image and the reference palette. Our method outperforms state-of-the-art techniques for color-conditional generation in terms of color similarity to the reference, producing images that not only match the reference colors but also maintain semantic coherence with the original text prompt. Our source code is available at https://github.com/alobashev/sw-guidance/.

Paper Structure

This paper contains 12 sections, 7 theorems, 46 equations, 17 figures, 4 tables, 2 algorithms.

Key Result

Proposition 1

Let $F$ and $G$ be two cumulative distribution functions. Then, where $F^{-1}$ and $G^{-1}$ are the quantile functions (inverse CDFs) of $F$ and $G$, respectively.

Figures (17)

  • Figure 1: Color-conditional generation by Sliced Wasserstein Guidance achieves unprecedented match with a reference palette.
  • Figure 1: Example of right continuous non-decreasing function.
  • Figure 2: General scheme of the Slices Wasserstein Guidance for a latent diffusion model with decoder $D$ and feature extractor $\Psi$.
  • Figure 2: Comparison with stylized generation methods. Examples from the test set. All images are generated by RealVisXL except of RB-Modulation running on Stable Cascade. Other methods have greater mismatch in color distributions and also often transfer some composition details such as: a forest (first row), a field of flowers (second row), a bouquet (third row), mountains (fourth row), cloudy sky and mountains (last row).
  • Figure 3: Comparison with stylized generation methods. All images are generated from the text prompt "a gorgeous photo of old-fashioned lighthouse" by RealVisXL except of RB-Modulation running on Stable Cascade. Other methods achieve a good match with a target palette but also transfer high-level features such as brush stokes or a field of wheat (see Table \ref{['tab:table-content-style-sdxl']} for metrics).
  • ...and 12 more figures

Theorems & Definitions (12)

  • Proposition 1
  • Lemma 2
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Proposition 4
  • proof
  • ...and 2 more