AI-Generated Content (AIGC) for Various Data Modalities: A Survey
Lin Geng Foo, Hossein Rahmani, Jun Liu
TL;DR
This survey comprehensively maps AI-generated content across text, image, video, 3D, and audio modalities, foregrounding machine learning and diffusion-based methods while detailing cross-modality generation (e.g., text-to-image, text-to-3D, text-to-video). It introduces a modality-centric taxonomy that systematically separates single-modality AIGC into unconditional and conditional generations and then organizes cross-modality work by output modality and conditioning input. The paper catalogs representative datasets, benchmarks, and trends, and it discusses core challenges, applications, and future directions, including data availability, privacy, and IP concerns. By presenting standardized comparisons and a unified framework, it aims to guide future research and support practitioners deploying multi-modal AIGC systems with awareness of their capabilities and limitations.
Abstract
AI-generated content (AIGC) methods aim to produce text, images, videos, 3D assets, and other media using AI algorithms. Due to its wide range of applications and the potential of recent works, AIGC developments -- especially in Machine Learning (ML) and Deep Learning (DL) -- have been attracting significant attention, and this survey focuses on comprehensively reviewing such advancements in ML/DL. AIGC methods have been developed for various data modalities, such as image, video, text, 3D shape, 3D scene, 3D human avatar, 3D motion, and audio -- each presenting unique characteristics and challenges. Furthermore, there have been significant developments in cross-modality AIGC methods, where generative methods receive conditioning input in one modality and produce outputs in another. Examples include going from various modalities to image, video, 3D, and audio. This paper provides a comprehensive review of AIGC methods across different data modalities, including both single-modality and cross-modality methods, highlighting the various challenges, representative works, and recent technical directions in each setting. We also survey the representative datasets throughout the modalities, and present comparative results for various modalities. Moreover, we discuss the typical applications of AIGC methods in various domains, challenges, and future research directions.
