Table of Contents
Fetching ...

M6: Multi-generator, Multi-domain, Multi-lingual and cultural, Multi-genres, Multi-instrument Machine-Generated Music Detection Databases

Yupei Li, Hanqian Li, Lucia Specia, Björn W. Schuller

TL;DR

MGMD faces a lack of diverse, public benchmarks. The authors introduce M6, a large-scale dataset spanning generators, domains, languages, cultures, genres, and instruments, with WAV data and baseline detectors to drive robust development. They provide a two-stage collection process (human-made and MGM) with careful quality control and extensive data analysis, plus baseline detector results showing both in-domain strength and out-of-domain weaknesses. The work lays the groundwork for improved detection methods and future multimodal, hierarchical models, and commits to releasing data and code to support open, collaborative advancement.

Abstract

Machine-generated music (MGM) has emerged as a powerful tool with applications in music therapy, personalised editing, and creative inspiration for the music community. However, its unregulated use threatens the entertainment, education, and arts sectors by diminishing the value of high-quality human compositions. Detecting machine-generated music (MGMD) is, therefore, critical to safeguarding these domains, yet the field lacks comprehensive datasets to support meaningful progress. To address this gap, we introduce \textbf{M6}, a large-scale benchmark dataset tailored for MGMD research. M6 is distinguished by its diversity, encompassing multiple generators, domains, languages, cultural contexts, genres, and instruments. We outline our methodology for data selection and collection, accompanied by detailed data analysis, providing all WAV form of music. Additionally, we provide baseline performance scores using foundational binary classification models, illustrating the complexity of MGMD and the significant room for improvement. By offering a robust and multifaceted resource, we aim to empower future research to develop more effective detection methods for MGM. We believe M6 will serve as a critical step toward addressing this societal challenge. The dataset and code will be freely available to support open collaboration and innovation in this field.

M6: Multi-generator, Multi-domain, Multi-lingual and cultural, Multi-genres, Multi-instrument Machine-Generated Music Detection Databases

TL;DR

MGMD faces a lack of diverse, public benchmarks. The authors introduce M6, a large-scale dataset spanning generators, domains, languages, cultures, genres, and instruments, with WAV data and baseline detectors to drive robust development. They provide a two-stage collection process (human-made and MGM) with careful quality control and extensive data analysis, plus baseline detector results showing both in-domain strength and out-of-domain weaknesses. The work lays the groundwork for improved detection methods and future multimodal, hierarchical models, and commits to releasing data and code to support open, collaborative advancement.

Abstract

Machine-generated music (MGM) has emerged as a powerful tool with applications in music therapy, personalised editing, and creative inspiration for the music community. However, its unregulated use threatens the entertainment, education, and arts sectors by diminishing the value of high-quality human compositions. Detecting machine-generated music (MGMD) is, therefore, critical to safeguarding these domains, yet the field lacks comprehensive datasets to support meaningful progress. To address this gap, we introduce \textbf{M6}, a large-scale benchmark dataset tailored for MGMD research. M6 is distinguished by its diversity, encompassing multiple generators, domains, languages, cultural contexts, genres, and instruments. We outline our methodology for data selection and collection, accompanied by detailed data analysis, providing all WAV form of music. Additionally, we provide baseline performance scores using foundational binary classification models, illustrating the complexity of MGMD and the significant room for improvement. By offering a robust and multifaceted resource, we aim to empower future research to develop more effective detection methods for MGM. We believe M6 will serve as a critical step toward addressing this societal challenge. The dataset and code will be freely available to support open collaboration and innovation in this field.

Paper Structure

This paper contains 25 sections, 2 figures, 8 tables.

Figures (2)

  • Figure 1: The process pipeline for collecting our database begins with determining the types of music to include. We aim to collect six key categories: music based on different instruments, different languages and cultures, with or without lyrics, various genres, differing music lengths, and general musics without any content restriction. Initially, we assess existing research datasets to see if they meet our requirements. If they do not, we obtain the necessary music from websites with appropriate licensing. Next, we generate our own set of MGM by utilising either research-based models or commercial models, conditioned on prompts generated by LLMs such as GPT-3.5.
  • Figure 2: AOC-ROC curve for model evaluation. The curves for three models are shown from left to right: (a), (b, c), (d), (e), and (abcde).