Fine color guidance in diffusion models and its application to image compression at extremely low bitrates
Tom Bordin, Thomas Maugey
TL;DR
This work introduces a training-free fine color guidance for diffusion models to steer outputs toward a predefined color map, applicable to both pixel- and latent-diffusion models. The color map is modeled as the low-frequency content via a 2D-DCT representation, and a new guidance term is derived by unrolling classifier guidance specifically for color conditioning. The method is integrated into a compression framework (CoCliCo) to enable semantic-plus-color reconstruction at ultra-low bitrates, outperforming training-free baselines and approaching semantic-compression methods in preserving color fidelity and realism. The results demonstrate improved color fidelity, perceptual realism, and semantic preservation, with the ability to operate without retraining and to scale across different diffusion frameworks, including latent-space models. This has practical implications for efficient, high-quality image compression and controlled image generation in resource-constrained settings.
Abstract
This study addresses the challenge of, without training or fine-tuning, controlling the global color aspect of images generated with a diffusion model. We rewrite the guidance equations to ensure that the outputs are closer to a known color map, and this without hindering the quality of the generation. Our method leads to new guidance equations. We show in the color guidance context that, the scaling of the guidance should not decrease but remains high throughout the diffusion process. In a second contribution, our guidance is applied in a compression framework, we combine both semantic and general color information on the image to decode the images at low cost. We show that our method is effective at improving fidelity and realism of compressed images at extremely low bit rates, when compared to other classical or more semantic oriented approaches.
