Table of Contents
Fetching ...

Too Vivid to Be Real? Benchmarking and Calibrating Generative Color Fidelity

Zhengyao Fang, Zexi Jia, Yijia Zhong, Pengcheng Luo, Jinchao Zhang, Guangming Lu, Jun Yu, Wenjie Pei

TL;DR

A training-free Color Fidelity Refinement (CFR) that adaptively modulates spatial-temporal guidance scale in generation, thereby enhancing color authenticity and forming a progressive framework for assessing and improving color fidelity in realistic-style T2I generation.

Abstract

Recent advances in text-to-image (T2I) generation have greatly improved visual quality, yet producing images that appear visually authentic to real-world photography remains challenging. This is partly due to biases in existing evaluation paradigms: human ratings and preference-trained metrics often favor visually vivid images with exaggerated saturation and contrast, which make generations often too vivid to be real even when prompted for realistic-style images. To address this issue, we present Color Fidelity Dataset (CFD) and Color Fidelity Metric (CFM) for objective evaluation of color fidelity in realistic-style generations. CFD contains over 1.3M real and synthetic images with ordered levels of color realism, while CFM employs a multimodal encoder to learn perceptual color fidelity. In addition, we propose a training-free Color Fidelity Refinement (CFR) that adaptively modulates spatial-temporal guidance scale in generation, thereby enhancing color authenticity. Together, CFD supports CFM for assessment, whose learned attention further guides CFR to refine T2I fidelity, forming a progressive framework for assessing and improving color fidelity in realistic-style T2I generation. The dataset and code are available at https://github.com/ZhengyaoFang/CFM.

Too Vivid to Be Real? Benchmarking and Calibrating Generative Color Fidelity

TL;DR

A training-free Color Fidelity Refinement (CFR) that adaptively modulates spatial-temporal guidance scale in generation, thereby enhancing color authenticity and forming a progressive framework for assessing and improving color fidelity in realistic-style T2I generation.

Abstract

Recent advances in text-to-image (T2I) generation have greatly improved visual quality, yet producing images that appear visually authentic to real-world photography remains challenging. This is partly due to biases in existing evaluation paradigms: human ratings and preference-trained metrics often favor visually vivid images with exaggerated saturation and contrast, which make generations often too vivid to be real even when prompted for realistic-style images. To address this issue, we present Color Fidelity Dataset (CFD) and Color Fidelity Metric (CFM) for objective evaluation of color fidelity in realistic-style generations. CFD contains over 1.3M real and synthetic images with ordered levels of color realism, while CFM employs a multimodal encoder to learn perceptual color fidelity. In addition, we propose a training-free Color Fidelity Refinement (CFR) that adaptively modulates spatial-temporal guidance scale in generation, thereby enhancing color authenticity. Together, CFD supports CFM for assessment, whose learned attention further guides CFR to refine T2I fidelity, forming a progressive framework for assessing and improving color fidelity in realistic-style T2I generation. The dataset and code are available at https://github.com/ZhengyaoFang/CFM.
Paper Structure (19 sections, 12 equations, 13 figures, 7 tables)

This paper contains 19 sections, 12 equations, 13 figures, 7 tables.

Figures (13)

  • Figure 1: A. Challenges in existing T2I generation and evaluation. (1) Statistical analysis shows that when prompted to produce realistic-style outputs, most T2I models generate images with higher saturation and contrast than real-world photographs. (2) A controlled saturation-scaling experiment on real photographs reveals that existing evaluation models exhibit a strong bias toward highly saturated images. The vertical axis shows the normalized score difference with respect to the reference scale (=1.0). B. Definition of color fidelity and the target of our CFM. Color fidelity measures how closely generated images preserve the natural color distribution of real-world photography, which serves as the learning objective of our proposed Color Fidelity Metric (CFM).
  • Figure 2: Overview of the Color Fidelity Dataset. We first apply IQA filtering process to obtain about 190K high-quality real-world images across 12 categories. Through automatic caption generation and guidance-controlled image synthesis, we further construct 1.33M images exhibiting ordered levels of color fidelity, and divide them into training and testing splits.
  • Figure 3: Framework of CFM and training pipeline. CFM employs Qwen2-VL as a multimodal feature backbone and an MLP projection head to map features into scalar fidelity scores. It is trained on the CFD-Training with group-wise, order-preserving samples, optimized by the softrank loss within each group.
  • Figure 4: Benchmark results of color fidelity evaluation across different text-to-image models. Left: category-wise CFM scores across 12 categories. Right: average CFM scores obtained by each model using our benchmark.
  • Figure 5: Examples of evaluation bias between existing metrics and ours. Existing metrics perform less favorably due to their training bias toward vivid, high-contrast images, whereas our proposed CFM provides a more accurate assessment of color realism, assigning higher scores to images with naturally balanced and authentic colors.
  • ...and 8 more figures