T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation
Kaiyue Sun, Rongyao Fang, Chengqi Duan, Xian Liu, Xihui Liu
TL;DR
The paper introduces T2I-ReasonBench, a reasoning-focused benchmark for text-to-image generation along four dimensions—Idioms, Textual Image Design, Entity Reasoning, and Scientific Reasoning—paired with a two-stage evaluation framework using LLMs and MLLMs to assess reasoning accuracy and image quality.It situates the benchmark within the current T2I landscape, arguing that existing datasets emphasize literal prompt-image alignment and fail to test deeper reasoning and knowledge integration required for complex scenes.Through a comprehensive evaluation of 14 state-of-the-art models (diffusion, autoregressive, and proprietary), the study reveals notable gaps in open-source models compared to proprietary systems, and shows that prompting strategies involving external reasoning can substantially improve performance.The work highlights the potential for combining explicit reasoning modules with generation and suggests future directions involving knowledge bases and broader reasoning tasks while underscoring ethical considerations around misuse of image synthesis.
Abstract
We propose T2I-ReasonBench, a benchmark evaluating reasoning capabilities of text-to-image (T2I) models. It consists of four dimensions: Idiom Interpretation, Textual Image Design, Entity-Reasoning and Scientific-Reasoning. We propose a two-stage evaluation protocol to assess the reasoning accuracy and image quality. We benchmark various T2I generation models, and provide comprehensive analysis on their performances.
