Color Conditional Generation with Sliced Wasserstein Guidance
Alexander Lobashev, Maria Larchenko, Dmitry Guskov
TL;DR
This work tackles the challenge of color-conditioned image generation by enabling precise color control through a training-free diffusion-guidance mechanism. It introduces SW-Guidance, which optimizes a differentiable $SW_1$ distance between the generated image's color distribution and a reference palette during sampling, thereby aligning colors while preserving the textual prompt's semantics. The approach yields state-of-the-art color similarity to reference palettes on SD1.5 and SDXL without transferring unwanted textures, and remains compatible with additional conditioning like ControlNets. Theoretical guarantees connect SW convergence to moment convergence, while extensive ablations and qualitative results validate effectiveness and robustness across architectures and datasets. The work is impactful for color-accurate image synthesis in creative and design tasks, offering a practical, training-free path to palette-guided generation.
Abstract
We propose SW-Guidance, a training-free approach for image generation conditioned on the color distribution of a reference image. While it is possible to generate an image with fixed colors by first creating an image from a text prompt and then applying a color style transfer method, this approach often results in semantically meaningless colors in the generated image. Our method solves this problem by modifying the sampling process of a diffusion model to incorporate the differentiable Sliced 1-Wasserstein distance between the color distribution of the generated image and the reference palette. Our method outperforms state-of-the-art techniques for color-conditional generation in terms of color similarity to the reference, producing images that not only match the reference colors but also maintain semantic coherence with the original text prompt. Our source code is available at https://github.com/alobashev/sw-guidance/.
