Table of Contents
Fetching ...

Synthetic Lyrics Detection Across Languages and Genres

Yanis Labrak, Markus Frohmann, Gabriel Meseguer-Brocal, Elena V. Epure

TL;DR

The paper tackles the problem of detecting AI-generated lyrics across multiple languages and genres by building a dedicated synthetic-lyrics dataset and a constrained generation pipeline, then evaluating a spectrum of detectors in cross-language and cross-genre settings. It combines probabilistic, semantic/syntactic, and stylistic features, with a focus on cross-lingual generalization and unsupervised domain adaptation via MNTP-based LLM2Vec representations. Key findings show that LLM2Vec-based detectors, especially with domain adaptation, offer strong performance, and that increasing language coverage yields more robust cross-language detection than merely increasing per-language data; the generated lyrics are convincingly realistic to human evaluators. The work provides datasets, benchmarks, and insights to improve transparency and fairness in AI-generated music while outlining practical limitations and directions for future research.

Abstract

In recent years, the use of large language models (LLMs) to generate music content, particularly lyrics, has gained in popularity. These advances provide valuable tools for artists and enhance their creative processes, but they also raise concerns about copyright violations, consumer satisfaction, and content spamming. Previous research has explored content detection in various domains. However, no work has focused on the text modality, lyrics, in music. To address this gap, we curated a diverse dataset of real and synthetic lyrics from multiple languages, music genres, and artists. The generation pipeline was validated using both humans and automated methods. We performed a thorough evaluation of existing synthetic text detection approaches on lyrics, a previously unexplored data type. We also investigated methods to adapt the best-performing features to lyrics through unsupervised domain adaptation. Following both music and industrial constraints, we examined how well these approaches generalize across languages, scale with data availability, handle multilingual language content, and perform on novel genres in few-shot settings. Our findings show promising results that could inform policy decisions around AI-generated music and enhance transparency for users.

Synthetic Lyrics Detection Across Languages and Genres

TL;DR

The paper tackles the problem of detecting AI-generated lyrics across multiple languages and genres by building a dedicated synthetic-lyrics dataset and a constrained generation pipeline, then evaluating a spectrum of detectors in cross-language and cross-genre settings. It combines probabilistic, semantic/syntactic, and stylistic features, with a focus on cross-lingual generalization and unsupervised domain adaptation via MNTP-based LLM2Vec representations. Key findings show that LLM2Vec-based detectors, especially with domain adaptation, offer strong performance, and that increasing language coverage yields more robust cross-language detection than merely increasing per-language data; the generated lyrics are convincingly realistic to human evaluators. The work provides datasets, benchmarks, and insights to improve transparency and fairness in AI-generated music while outlining practical limitations and directions for future research.

Abstract

In recent years, the use of large language models (LLMs) to generate music content, particularly lyrics, has gained in popularity. These advances provide valuable tools for artists and enhance their creative processes, but they also raise concerns about copyright violations, consumer satisfaction, and content spamming. Previous research has explored content detection in various domains. However, no work has focused on the text modality, lyrics, in music. To address this gap, we curated a diverse dataset of real and synthetic lyrics from multiple languages, music genres, and artists. The generation pipeline was validated using both humans and automated methods. We performed a thorough evaluation of existing synthetic text detection approaches on lyrics, a previously unexplored data type. We also investigated methods to adapt the best-performing features to lyrics through unsupervised domain adaptation. Following both music and industrial constraints, we examined how well these approaches generalize across languages, scale with data availability, handle multilingual language content, and perform on novel genres in few-shot settings. Our findings show promising results that could inform policy decisions around AI-generated music and enhance transparency for users.
Paper Structure (42 sections, 5 figures, 18 tables)

This paper contains 42 sections, 5 figures, 18 tables.

Figures (5)

  • Figure 1: Effect of domain adaptation using additional samples from the evaluation set on 3 seeds (solid circles indicate individual runs), including mean (open circle) and standard deviation. No adaptation corresponds to the original LLM2Vec model, whereas Unsupervised performs MNTP-based adaptation. In each scenario, we use Llama 3 8B.
  • Figure 2: 3-shot lyrics generation template.
  • Figure 3: List of confidence scores options and their descriptions.
  • Figure 4: Transcribed interview in the human study.
  • Figure 5: Effect of domain adaptation on per-language performance using additional samples from the evaluation set on 3 seeds (solid circles indicate individual runs), including mean (open circle) and standard deviation. Note that the vector space is built using songs from all languages. No adaptation corresponds to the original LLM2Vec model, whereas Unsupervised performs MNTP-based adaptation. In each scenario, we use Llama 3 8B.