Table of Contents
Fetching ...

GCC: Generative Color Constancy via Diffusing a Color Checker

Chen-Wei Chang, Cheng-De Fan, Chia-Che Chang, Yi-Chen Lo, Yu-Chee Tseng, Jiun-Long Huang, Yu-Lun Liu

TL;DR

This work addresses color constancy under varying camera sensors by leveraging diffusion priors to inpaint a virtual color checker into scenes, from which illumination is estimated. It introduces a deterministic, single-step inference at fixed timestep $t=T$, a Laplacian decomposition to preserve checker structure while adapting to illumination, and a mask-based augmentation to tolerate imprecise color-checker annotations. The approach demonstrates strong cross-camera generalization without sensor-specific training, offering practical, efficient white balance across diverse scenes and enabling spatially varying illumination handling. While effective, it notes limitations in extreme multi-illuminant conditions and small datasets, suggesting avenues for future refinement and data augmentation strategies.

Abstract

Color constancy methods often struggle to generalize across different camera sensors due to varying spectral sensitivities. We present GCC, which leverages diffusion models to inpaint color checkers into images for illumination estimation. Our key innovations include (1) a single-step deterministic inference approach that inpaints color checkers reflecting scene illumination, (2) a Laplacian decomposition technique that preserves checker structure while allowing illumination-dependent color adaptation, and (3) a mask-based data augmentation strategy for handling imprecise color checker annotations. By harnessing rich priors from pre-trained diffusion models, GCC demonstrates strong robustness in challenging cross-camera scenarios. These results highlight our method's effective generalization capability across different camera characteristics without requiring sensor-specific training, making it a versatile and practical solution for real-world applications.

GCC: Generative Color Constancy via Diffusing a Color Checker

TL;DR

This work addresses color constancy under varying camera sensors by leveraging diffusion priors to inpaint a virtual color checker into scenes, from which illumination is estimated. It introduces a deterministic, single-step inference at fixed timestep , a Laplacian decomposition to preserve checker structure while adapting to illumination, and a mask-based augmentation to tolerate imprecise color-checker annotations. The approach demonstrates strong cross-camera generalization without sensor-specific training, offering practical, efficient white balance across diverse scenes and enabling spatially varying illumination handling. While effective, it notes limitations in extreme multi-illuminant conditions and small datasets, suggesting avenues for future refinement and data augmentation strategies.

Abstract

Color constancy methods often struggle to generalize across different camera sensors due to varying spectral sensitivities. We present GCC, which leverages diffusion models to inpaint color checkers into images for illumination estimation. Our key innovations include (1) a single-step deterministic inference approach that inpaints color checkers reflecting scene illumination, (2) a Laplacian decomposition technique that preserves checker structure while allowing illumination-dependent color adaptation, and (3) a mask-based data augmentation strategy for handling imprecise color checker annotations. By harnessing rich priors from pre-trained diffusion models, GCC demonstrates strong robustness in challenging cross-camera scenarios. These results highlight our method's effective generalization capability across different camera characteristics without requiring sensor-specific training, making it a versatile and practical solution for real-world applications.

Paper Structure

This paper contains 32 sections, 3 equations, 11 figures, 9 tables, 1 algorithm.

Figures (11)

  • Figure 1: Overview of our training pipeline. Starting from stable-diffusion-2-inpainting rombach2021highresolution, we enable color checker generation through end-to-end fine-tuning. Given a ground truth color checker image and its mask, we apply color jittering in the masked region. The input image latent passes through Laplacian decomposition before being concatenated with the masked image latent and the resized mask for the SD Inpainting U-Net. The model is trained with an L2 loss between the inpainted output and ground truth image at a fixed timestep $T$.
  • Figure 2: Overview of our inference pipeline for illumination estimation. A neutral color checker is pasted onto the input image, which is then encoded into the latent space. The input latent is processed through Laplacian composition before being concatenated with the masked image latent and the resized mask. The modified U-Net generates an inpainted result at fixed timestep $T$. After inverse gamma correction, we sample the color checker patches to obtain the final RGB illumination value. We highlight the steps and components that are different from the training pipeline.
  • Figure 3: Analysis of color checker alignment strategies. (a) Direct inpainting on masked regions leads to poor color checker structure. This is because we do not provide guidance on the desired color checker structure, causing the model to generate contours that do not meet our expectations. (b) Using a homography transform to overlay a template suffers from pixel-level misalignment due to imprecise bounding box annotations. (c) Our mask color jittering approach overcomes corner annotation limitations by allowing the model to generate geometrically consistent color checker structures while accurately reflecting scene illumination.
  • Figure 4: Sensitivity to color checker placement. This figure demonstrates the robustness of our method across various color checker positions under a single light source scenario. The left part displays different placements of color checkers and their corresponding processed results, showing that our method remains effective under challenging warm color temperatures (regions with lower data distribution). The scatter plots on the right quantitatively validate this observation, where the estimated illumination values consistently cluster near the ground truth target, confirming the precision and consistency of our approach.
  • Figure 5: Spatially varying illumination in multi-source scenes. From left to right: input image with mixed illumination, illuminant coefficient map showing per-pixel light distribution, our white balanced result, and ground truth white balanced image.
  • ...and 6 more figures