Table of Contents
Fetching ...

Fine color guidance in diffusion models and its application to image compression at extremely low bitrates

Tom Bordin, Thomas Maugey

TL;DR

This work introduces a training-free fine color guidance for diffusion models to steer outputs toward a predefined color map, applicable to both pixel- and latent-diffusion models. The color map is modeled as the low-frequency content via a 2D-DCT representation, and a new guidance term is derived by unrolling classifier guidance specifically for color conditioning. The method is integrated into a compression framework (CoCliCo) to enable semantic-plus-color reconstruction at ultra-low bitrates, outperforming training-free baselines and approaching semantic-compression methods in preserving color fidelity and realism. The results demonstrate improved color fidelity, perceptual realism, and semantic preservation, with the ability to operate without retraining and to scale across different diffusion frameworks, including latent-space models. This has practical implications for efficient, high-quality image compression and controlled image generation in resource-constrained settings.

Abstract

This study addresses the challenge of, without training or fine-tuning, controlling the global color aspect of images generated with a diffusion model. We rewrite the guidance equations to ensure that the outputs are closer to a known color map, and this without hindering the quality of the generation. Our method leads to new guidance equations. We show in the color guidance context that, the scaling of the guidance should not decrease but remains high throughout the diffusion process. In a second contribution, our guidance is applied in a compression framework, we combine both semantic and general color information on the image to decode the images at low cost. We show that our method is effective at improving fidelity and realism of compressed images at extremely low bit rates, when compared to other classical or more semantic oriented approaches.

Fine color guidance in diffusion models and its application to image compression at extremely low bitrates

TL;DR

This work introduces a training-free fine color guidance for diffusion models to steer outputs toward a predefined color map, applicable to both pixel- and latent-diffusion models. The color map is modeled as the low-frequency content via a 2D-DCT representation, and a new guidance term is derived by unrolling classifier guidance specifically for color conditioning. The method is integrated into a compression framework (CoCliCo) to enable semantic-plus-color reconstruction at ultra-low bitrates, outperforming training-free baselines and approaching semantic-compression methods in preserving color fidelity and realism. The results demonstrate improved color fidelity, perceptual realism, and semantic preservation, with the ability to operate without retraining and to scale across different diffusion frameworks, including latent-space models. This has practical implications for efficient, high-quality image compression and controlled image generation in resource-constrained settings.

Abstract

This study addresses the challenge of, without training or fine-tuning, controlling the global color aspect of images generated with a diffusion model. We rewrite the guidance equations to ensure that the outputs are closer to a known color map, and this without hindering the quality of the generation. Our method leads to new guidance equations. We show in the color guidance context that, the scaling of the guidance should not decrease but remains high throughout the diffusion process. In a second contribution, our guidance is applied in a compression framework, we combine both semantic and general color information on the image to decode the images at low cost. We show that our method is effective at improving fidelity and realism of compressed images at extremely low bit rates, when compared to other classical or more semantic oriented approaches.
Paper Structure (30 sections, 33 equations, 11 figures, 2 tables)

This paper contains 30 sections, 33 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Two images generated with a Diffusion Model, conditioned by the same semantics (rocky mountain and cloudy sky). Contrary to the left image, the right image is obtained by guiding the diffusion model towards a general color aspect (top right).
  • Figure 2: Illustration of the latent diffusion models. Diffusion in the domain of pixels becomes a particular case where there is no encoder or decoders $E_{ldm}=D_{ldm}=I$
  • Figure 3: Illustration of different method to control the diffusion process. The region of possible sample is illustrated in blue. The black arrow shows a possible diffusion schedule. The first figure shows a standard diffusion process with no control. Our goal is to focus the output region as close as possible of $\bm z_0| \bm c$ as possible.
  • Figure 4: Scaling of the gradient in guidance. In universal-guidance, the scaling decreases to $0$, we observed that theoretically, guidance for color should follow another guidance schedule.
  • Figure 5: Encoding-decoding pipeline of our framework. Our guidance term $G$ is added over the CoCliCo framework based on CLIP and color semantic. We provide a color correction during generation.
  • ...and 6 more figures