AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI
Fanda Fan, Chunjie Luo, Wanling Gao, Jianfeng Zhan
TL;DR
AIGCBench presents a comprehensive, scalable benchmark for Image-to-Video generation, addressing the lack of open-domain, diverse evaluation data and establishing a unified 11-metric framework across four dimensions—control-video alignment, motion effects, temporal consistency, and video quality. It uses a generation pipeline with a text combiner and GPT-4 to create rich image-text prompts and images via Text-to-Image diffusion, enabling fair comparisons across open- and closed-source I2V models. The framework is validated against human judgments and deployed on real-world datasets (WebVid-10M, LAION-5B) as well as generated image-text pairs, highlighting strengths and weaknesses of current models and guiding future improvements in fine-grained control, longer video generation, and faster inference. Open-sourcing the dataset and evaluation code, the work aims to standardize I2V benchmarking and accelerate progress in the broader AIGC landscape.
Abstract
The burgeoning field of Artificial Intelligence Generated Content (AIGC) is witnessing rapid advancements, particularly in video generation. This paper introduces AIGCBench, a pioneering comprehensive and scalable benchmark designed to evaluate a variety of video generation tasks, with a primary focus on Image-to-Video (I2V) generation. AIGCBench tackles the limitations of existing benchmarks, which suffer from a lack of diverse datasets, by including a varied and open-domain image-text dataset that evaluates different state-of-the-art algorithms under equivalent conditions. We employ a novel text combiner and GPT-4 to create rich text prompts, which are then used to generate images via advanced Text-to-Image models. To establish a unified evaluation framework for video generation tasks, our benchmark includes 11 metrics spanning four dimensions to assess algorithm performance. These dimensions are control-video alignment, motion effects, temporal consistency, and video quality. These metrics are both reference video-dependent and video-free, ensuring a comprehensive evaluation strategy. The evaluation standard proposed correlates well with human judgment, providing insights into the strengths and weaknesses of current I2V algorithms. The findings from our extensive experiments aim to stimulate further research and development in the I2V field. AIGCBench represents a significant step toward creating standardized benchmarks for the broader AIGC landscape, proposing an adaptable and equitable framework for future assessments of video generation tasks. We have open-sourced the dataset and evaluation code on the project website: https://www.benchcouncil.org/AIGCBench.
