Survey on AI-Generated Media Detection: From Non-MLLM to MLLM

Yueying Zou; Peipei Li; Zekun Li; Huaibo Huang; Xing Cui; Xuannan Liu; Chenghanyu Zhang; Ran He

Survey on AI-Generated Media Detection: From Non-MLLM to MLLM

Yueying Zou, Peipei Li, Zekun Li, Huaibo Huang, Xing Cui, Xuannan Liu, Chenghanyu Zhang, Ran He

TL;DR

This survey addresses AI-generated media detection with a focus on the transition from domain-specific Non-MLLM detectors to general-purpose MLLM-based detectors. It provides a structured taxonomy across single-modal and multimodal tasks—authenticity, explainability, and localization—and compares methods, datasets, and evaluation metrics while highlighting ethical and regulatory implications. The work identifies key gaps, such as explainability and localization in video and audio, and discusses hybrid approaches that combine specialized detectors with generalized MLLMs. By detailing benchmarks, policy frameworks, and future directions, the paper offers a comprehensive foundation for researchers and policymakers to advance robust, transparent, and secure GenAI detection technologies.

Abstract

The proliferation of AI-generated media poses significant challenges to information authenticity and social trust, making reliable detection methods highly demanded. Methods for detecting AI-generated media have evolved rapidly, paralleling the advancement of Multimodal Large Language Models (MLLMs). Current detection approaches can be categorized into two main groups: Non-MLLM-based and MLLM-based methods. The former employs high-precision, domain-specific detectors powered by deep learning techniques, while the latter utilizes general-purpose detectors based on MLLMs that integrate authenticity verification, explainability, and localization capabilities. Despite significant progress in this field, there remains a gap in literature regarding a comprehensive survey that examines the transition from domain-specific to general-purpose detection methods. This paper addresses this gap by providing a systematic review of both approaches, analyzing them from single-modal and multi-modal perspectives. We present a detailed comparative analysis of these categories, examining their methodological similarities and differences. Through this analysis, we explore potential hybrid approaches and identify key challenges in forgery detection, providing direction for future research. Additionally, as MLLMs become increasingly prevalent in detection tasks, ethical and security considerations have emerged as critical global concerns. We examine the regulatory landscape surrounding Generative AI (GenAI) across various jurisdictions, offering valuable insights for researchers and practitioners in this field.

Survey on AI-Generated Media Detection: From Non-MLLM to MLLM

TL;DR

Abstract

Survey on AI-Generated Media Detection: From Non-MLLM to MLLM

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)