Table of Contents
Fetching ...

NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment

Shuhao Han, Haotian Fan, Fangyuan Kong, Wenjie Liao, Chunle Guo, Chongyi Li, Radu Timofte, Liang Li, Tao Li, Junhui Cui, Yunqiu Wang, Yang Tai, Jingwei Sun, Jianhui Sun, Xinli Yue, Tianyi Wang, Huan Hou, Junda Lu, Xinyang Huang, Zitang Zhou, Zijian Zhang, Xuhui Zheng, Xuecheng Wu, Chong Peng, Xuezhi Cao, Trong-Hieu Nguyen-Mau, Minh-Hoang Le, Minh-Khoa Le-Phan, Duy-Nam Ly, Hai-Dang Nguyen, Minh-Triet Tran, Yukang Lin, Yan Hong, Chuanbiao Song, Siyuan Li, Jun Lan, Zhichao Zhang, Xinyue Li, Wei Sun, Zicheng Zhang, Yunhao Li, Xiaohong Liu, Guangtao Zhai, Zitong Xu, Huiyu Duan, Jiarui Wang, Guangji Ma, Liu Yang, Lu Liu, Qiang Hu, Xiongkuo Min, Zichuan Wang, Zhenchen Tang, Bo Peng, Jing Dong, Fengbin Guan, Zihao Yu, Yiting Lu, Wei Luo, Xin Li, Minhao Lin, Haofeng Chen, Xuanxuan He, Kele Xu, Qisheng Xu, Zijian Gao, Tianjiao Wan, Bo-Cheng Qiu, Chih-Chung Hsu, Chia-ming Lee, Yu-Fan Lin, Bo Yu, Zehao Wang, Da Mu, Mingxiu Chen, Junkang Fang, Huamei Sun, Wending Zhao, Zhiyu Wang, Wang Liu, Weikang Yu, Puhong Duan, Bin Sun, Xudong Kang, Shutao Li, Shuai He, Lingzhi Fu, Heng Cong, Rongyu Zhang, Jiarong He, Zhishan Qiao, Yongqing Huang, Zewen Chen, Zhe Pang, Juan Wang, Jian Guo, Zhizhuo Shao, Ziyu Feng, Bing Li, Weiming Hu, Hesong Li, Dehua Liu, Zeming Liu, Qingsong Xie, Ruichen Wang, Zhihao Li, Yuqi Liang, Jianqi Bi, Jun Luo, Junfeng Yang, Can Li, Jing Fu, Hongwei Xu, Mingrui Long, Lulin Tang

TL;DR

This work introduces the NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment, addressing fine-grained evaluation of image-text alignment and structural fidelity for T2I models. It develops two datasets, EvalMuse-40K for alignment and EvalMuse-Structure for structure distortions, and establishes an evaluation protocol using SRCC, PLCC, ACC, and F1 with dedicated main-score formulations. The paper presents a diverse set of entry methods, including LVLM fine-tuning, multi-model ensembles, prompt engineering, heatmap/score dual predictions, and two-stage training, all beating baselines and yielding state-of-the-art fine-grained T2I quality assessment. The results demonstrate the feasibility and value of fine-grained, modular QA-style evaluation in guiding improvements to T2I models and facilitating more reliable alignment and structural fidelity assessments in practical applications.

Abstract

This paper reports on the NTIRE 2025 challenge on Text to Image (T2I) generation model quality assessment, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2025. The aim of this challenge is to address the fine-grained quality assessment of text-to-image generation models. This challenge evaluates text-to-image models from two aspects: image-text alignment and image structural distortion detection, and is divided into the alignment track and the structural track. The alignment track uses the EvalMuse-40K, which contains around 40K AI-Generated Images (AIGIs) generated by 20 popular generative models. The alignment track has a total of 371 registered participants. A total of 1,883 submissions are received in the development phase, and 507 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. The structure track uses the EvalMuse-Structure, which contains 10,000 AI-Generated Images (AIGIs) with corresponding structural distortion mask. A total of 211 participants have registered in the structure track. A total of 1155 submissions are received in the development phase, and 487 submissions are received in the test phase. Finally, 8 participating teams submitted their models and fact sheets. Almost all methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on T2I model quality assessment.

NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment

TL;DR

This work introduces the NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment, addressing fine-grained evaluation of image-text alignment and structural fidelity for T2I models. It develops two datasets, EvalMuse-40K for alignment and EvalMuse-Structure for structure distortions, and establishes an evaluation protocol using SRCC, PLCC, ACC, and F1 with dedicated main-score formulations. The paper presents a diverse set of entry methods, including LVLM fine-tuning, multi-model ensembles, prompt engineering, heatmap/score dual predictions, and two-stage training, all beating baselines and yielding state-of-the-art fine-grained T2I quality assessment. The results demonstrate the feasibility and value of fine-grained, modular QA-style evaluation in guiding improvements to T2I models and facilitating more reliable alignment and structural fidelity assessments in practical applications.

Abstract

This paper reports on the NTIRE 2025 challenge on Text to Image (T2I) generation model quality assessment, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2025. The aim of this challenge is to address the fine-grained quality assessment of text-to-image generation models. This challenge evaluates text-to-image models from two aspects: image-text alignment and image structural distortion detection, and is divided into the alignment track and the structural track. The alignment track uses the EvalMuse-40K, which contains around 40K AI-Generated Images (AIGIs) generated by 20 popular generative models. The alignment track has a total of 371 registered participants. A total of 1,883 submissions are received in the development phase, and 507 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. The structure track uses the EvalMuse-Structure, which contains 10,000 AI-Generated Images (AIGIs) with corresponding structural distortion mask. A total of 211 participants have registered in the structure track. A total of 1155 submissions are received in the development phase, and 487 submissions are received in the test phase. Finally, 8 participating teams submitted their models and fact sheets. Almost all methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on T2I model quality assessment.

Paper Structure

This paper contains 35 sections, 5 equations, 13 figures, 2 tables.

Figures (13)

  • Figure 1: Overview of team IH-VQA proposed iMatch.
  • Figure 2: Overview of team Evalthon proposed method.
  • Figure 3: Overview of team HCMUS proposed method.
  • Figure 4: Overview of team MICV proposed method.
  • Figure 5: Overview of team SJTU-MMLab proposed method.
  • ...and 8 more figures