From Audio Deepfake Detection to AI-Generated Music Detection -- A Pathway and Overview
Yupei Li, Manuel Milling, Lucia Specia, Björn W. Schuller
TL;DR
The paper addresses the rising use of AI in music generation and the need for reliable detection of AI-generated music (AIGM). It surveys music feature representations and detection methods, bridging audio deepfake detection with musicology-driven AIGM detection, and discusses datasets, detectors, and multimodal approaches. It proposes leveraging foundation-model techniques from audio deepfake detection to AIGM detection and outlines future research directions to improve robustness and explainability. The work highlights the importance of intrinsic music features, domain-specific detectors, and the potential societal and industry implications of AIGM.
Abstract
As Artificial Intelligence (AI) technologies continue to evolve, their use in generating realistic, contextually appropriate content has expanded into various domains. Music, an art form and medium for entertainment, deeply rooted into human culture, is seeing an increased involvement of AI into its production. However, despite the effective application of AI music generation (AIGM) tools, the unregulated use of them raises concerns about potential negative impacts on the music industry, copyright and artistic integrity, underscoring the importance of effective AIGM detection. This paper provides an overview of existing AIGM detection methods. To lay a foundation to the general workings and challenges of AIGM detection, we first review general principles of AIGM, including recent advancements in deepfake audios, as well as multimodal detection techniques. We further propose a potential pathway for leveraging foundation models from audio deepfake detection to AIGM detection. Additionally, we discuss implications of these tools and propose directions for future research to address ongoing challenges in the field.
