Table of Contents
Fetching ...

A Survey of AI-Generated Video Evaluation

Xiao Liu, Xinhao Xiang, Zizhong Li, Yongheng Wang, Zhuoheng Li, Zhuosheng Liu, Weidi Zhang, Weiqi Ye, Jiawei Zhang

TL;DR

This survey identifies the emerging field of AI-Generated Video Evaluation (AIGVE), highlighting the importance of assessing how well AI-generated videos align with human perception and meet specific instructions and advocates for more robust and nuanced evaluation frameworks that can handle the complexities of video content.

Abstract

The growing capabilities of AI in generating video content have brought forward significant challenges in effectively evaluating these videos. Unlike static images or text, video content involves complex spatial and temporal dynamics which may require a more comprehensive and systematic evaluation of its contents in aspects like video presentation quality, semantic information delivery, alignment with human intentions, and the virtual-reality consistency with our physical world. This survey identifies the emerging field of AI-Generated Video Evaluation (AIGVE), highlighting the importance of assessing how well AI-generated videos align with human perception and meet specific instructions. We provide a structured analysis of existing methodologies that could be potentially used to evaluate AI-generated videos. By outlining the strengths and gaps in current approaches, we advocate for the development of more robust and nuanced evaluation frameworks that can handle the complexities of video content, which include not only the conventional metric-based evaluations, but also the current human-involved evaluations, and the future model-centered evaluations. This survey aims to establish a foundational knowledge base for both researchers from academia and practitioners from the industry, facilitating the future advancement of evaluation methods for AI-generated video content.

A Survey of AI-Generated Video Evaluation

TL;DR

This survey identifies the emerging field of AI-Generated Video Evaluation (AIGVE), highlighting the importance of assessing how well AI-generated videos align with human perception and meet specific instructions and advocates for more robust and nuanced evaluation frameworks that can handle the complexities of video content.

Abstract

The growing capabilities of AI in generating video content have brought forward significant challenges in effectively evaluating these videos. Unlike static images or text, video content involves complex spatial and temporal dynamics which may require a more comprehensive and systematic evaluation of its contents in aspects like video presentation quality, semantic information delivery, alignment with human intentions, and the virtual-reality consistency with our physical world. This survey identifies the emerging field of AI-Generated Video Evaluation (AIGVE), highlighting the importance of assessing how well AI-generated videos align with human perception and meet specific instructions. We provide a structured analysis of existing methodologies that could be potentially used to evaluate AI-generated videos. By outlining the strengths and gaps in current approaches, we advocate for the development of more robust and nuanced evaluation frameworks that can handle the complexities of video content, which include not only the conventional metric-based evaluations, but also the current human-involved evaluations, and the future model-centered evaluations. This survey aims to establish a foundational knowledge base for both researchers from academia and practitioners from the industry, facilitating the future advancement of evaluation methods for AI-generated video content.

Paper Structure

This paper contains 24 sections, 5 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Evolution of Video Generation Models Over Time.
  • Figure 2: Case Study of AI-Generated Videos. Although current studies can generate high-quality videos (i.e., the green cases), the generated videos still have flaws in certain conditions including physical perception error (i.e., the red cases) and incoherence with the instructions (i.e., the blue cases). Specifically, the areas in red bound boxes indicate the anomaly physical perception contents in the videos, and the blue highlighted fonts indicate the incoherence between human text instructions and the contents in generated videos.
  • Figure 3: The development and overview of AI-Generated Video Evaluation (AIGVE). AIGVE was built on two initially separate aspects: 1) Alignment with human perception , and 2) Alignment with human instructions. Note that the timeline scales are different for two aspects. Release date represents the date that this survey is released.
  • Figure 4: AIGVE Benchmark Dataset Collection Process.
  • Figure 5: The Proportion of Videos Generated by Each Text-to-video Model.
  • ...and 5 more figures