Table of Contents
Fetching ...

From Audio Deepfake Detection to AI-Generated Music Detection -- A Pathway and Overview

Yupei Li, Manuel Milling, Lucia Specia, Björn W. Schuller

TL;DR

The paper addresses the rising use of AI in music generation and the need for reliable detection of AI-generated music (AIGM). It surveys music feature representations and detection methods, bridging audio deepfake detection with musicology-driven AIGM detection, and discusses datasets, detectors, and multimodal approaches. It proposes leveraging foundation-model techniques from audio deepfake detection to AIGM detection and outlines future research directions to improve robustness and explainability. The work highlights the importance of intrinsic music features, domain-specific detectors, and the potential societal and industry implications of AIGM.

Abstract

As Artificial Intelligence (AI) technologies continue to evolve, their use in generating realistic, contextually appropriate content has expanded into various domains. Music, an art form and medium for entertainment, deeply rooted into human culture, is seeing an increased involvement of AI into its production. However, despite the effective application of AI music generation (AIGM) tools, the unregulated use of them raises concerns about potential negative impacts on the music industry, copyright and artistic integrity, underscoring the importance of effective AIGM detection. This paper provides an overview of existing AIGM detection methods. To lay a foundation to the general workings and challenges of AIGM detection, we first review general principles of AIGM, including recent advancements in deepfake audios, as well as multimodal detection techniques. We further propose a potential pathway for leveraging foundation models from audio deepfake detection to AIGM detection. Additionally, we discuss implications of these tools and propose directions for future research to address ongoing challenges in the field.

From Audio Deepfake Detection to AI-Generated Music Detection -- A Pathway and Overview

TL;DR

The paper addresses the rising use of AI in music generation and the need for reliable detection of AI-generated music (AIGM). It surveys music feature representations and detection methods, bridging audio deepfake detection with musicology-driven AIGM detection, and discusses datasets, detectors, and multimodal approaches. It proposes leveraging foundation-model techniques from audio deepfake detection to AIGM detection and outlines future research directions to improve robustness and explainability. The work highlights the importance of intrinsic music features, domain-specific detectors, and the potential societal and industry implications of AIGM.

Abstract

As Artificial Intelligence (AI) technologies continue to evolve, their use in generating realistic, contextually appropriate content has expanded into various domains. Music, an art form and medium for entertainment, deeply rooted into human culture, is seeing an increased involvement of AI into its production. However, despite the effective application of AI music generation (AIGM) tools, the unregulated use of them raises concerns about potential negative impacts on the music industry, copyright and artistic integrity, underscoring the importance of effective AIGM detection. This paper provides an overview of existing AIGM detection methods. To lay a foundation to the general workings and challenges of AIGM detection, we first review general principles of AIGM, including recent advancements in deepfake audios, as well as multimodal detection techniques. We further propose a potential pathway for leveraging foundation models from audio deepfake detection to AIGM detection. Additionally, we discuss implications of these tools and propose directions for future research to address ongoing challenges in the field.

Paper Structure

This paper contains 13 sections, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Five steps of music production: Composition, some scattered melodies harmonies, and rhythms are created to form basic music components; Arrangement, these components are organised into complete pieces with selection of instruments and harmonic structures etc.; Sound Design, the pieces create and modify tones with synthesisers to achieve sound effects; Mixing, the balance of each audio track is adjusted to provide better overall coherence; Mastering is the final optimisation such as format preparation for different devices and final check with unintentional error.