Too Vivid to Be Real? Benchmarking and Calibrating Generative Color Fidelity

Zhengyao Fang; Zexi Jia; Yijia Zhong; Pengcheng Luo; Jinchao Zhang; Guangming Lu; Jun Yu; Wenjie Pei

Too Vivid to Be Real? Benchmarking and Calibrating Generative Color Fidelity

Zhengyao Fang, Zexi Jia, Yijia Zhong, Pengcheng Luo, Jinchao Zhang, Guangming Lu, Jun Yu, Wenjie Pei

TL;DR

A training-free Color Fidelity Refinement (CFR) that adaptively modulates spatial-temporal guidance scale in generation, thereby enhancing color authenticity and forming a progressive framework for assessing and improving color fidelity in realistic-style T2I generation.

Abstract

Recent advances in text-to-image (T2I) generation have greatly improved visual quality, yet producing images that appear visually authentic to real-world photography remains challenging. This is partly due to biases in existing evaluation paradigms: human ratings and preference-trained metrics often favor visually vivid images with exaggerated saturation and contrast, which make generations often too vivid to be real even when prompted for realistic-style images. To address this issue, we present Color Fidelity Dataset (CFD) and Color Fidelity Metric (CFM) for objective evaluation of color fidelity in realistic-style generations. CFD contains over 1.3M real and synthetic images with ordered levels of color realism, while CFM employs a multimodal encoder to learn perceptual color fidelity. In addition, we propose a training-free Color Fidelity Refinement (CFR) that adaptively modulates spatial-temporal guidance scale in generation, thereby enhancing color authenticity. Together, CFD supports CFM for assessment, whose learned attention further guides CFR to refine T2I fidelity, forming a progressive framework for assessing and improving color fidelity in realistic-style T2I generation. The dataset and code are available at https://github.com/ZhengyaoFang/CFM.

Too Vivid to Be Real? Benchmarking and Calibrating Generative Color Fidelity

TL;DR

Abstract

Paper Structure (19 sections, 12 equations, 13 figures, 7 tables)

This paper contains 19 sections, 12 equations, 13 figures, 7 tables.

Introduction
Related Work
Text-to-Image Generation
Text-to-Image Assessment
Color Fidelity Dataset
Preliminaries: Guidance Scale
Dataset Construction
Human Annotation
Color Fidelity Metric
Architecture
Training Objective
Color Fidelity Refinement
Experiment
Implementation Details
Benchmark and Evaluation
...and 4 more sections

Figures (13)

Figure 1: A. Challenges in existing T2I generation and evaluation. (1) Statistical analysis shows that when prompted to produce realistic-style outputs, most T2I models generate images with higher saturation and contrast than real-world photographs. (2) A controlled saturation-scaling experiment on real photographs reveals that existing evaluation models exhibit a strong bias toward highly saturated images. The vertical axis shows the normalized score difference with respect to the reference scale (=1.0). B. Definition of color fidelity and the target of our CFM. Color fidelity measures how closely generated images preserve the natural color distribution of real-world photography, which serves as the learning objective of our proposed Color Fidelity Metric (CFM).
Figure 2: Overview of the Color Fidelity Dataset. We first apply IQA filtering process to obtain about 190K high-quality real-world images across 12 categories. Through automatic caption generation and guidance-controlled image synthesis, we further construct 1.33M images exhibiting ordered levels of color fidelity, and divide them into training and testing splits.
Figure 3: Framework of CFM and training pipeline. CFM employs Qwen2-VL as a multimodal feature backbone and an MLP projection head to map features into scalar fidelity scores. It is trained on the CFD-Training with group-wise, order-preserving samples, optimized by the softrank loss within each group.
Figure 4: Benchmark results of color fidelity evaluation across different text-to-image models. Left: category-wise CFM scores across 12 categories. Right: average CFM scores obtained by each model using our benchmark.
Figure 5: Examples of evaluation bias between existing metrics and ours. Existing metrics perform less favorably due to their training bias toward vivid, high-contrast images, whereas our proposed CFM provides a more accurate assessment of color realism, assigning higher scores to images with naturally balanced and authentic colors.
...and 8 more figures

Too Vivid to Be Real? Benchmarking and Calibrating Generative Color Fidelity

TL;DR

Abstract

Too Vivid to Be Real? Benchmarking and Calibrating Generative Color Fidelity

Authors

TL;DR

Abstract

Table of Contents

Figures (13)