From Real to Cloned Singer Identification
Dorian Desblancs, Gabriel Meseguer-Brocal, Romain Hennequin, Manuel Moussallam
TL;DR
This work tackles the problem of identifying original singers when confronted with AI-generated clones in music. It introduces three embedding models trained with singer-level contrastive learning using inputs of mixtures, vocal stems, or both, and evaluates them on open (FMA, MTG) and large closed datasets, plus cloned-voice tracks. Real-singer identification is strong across models, but performance drops sharply for cloned voices—especially for mixture-based inputs—highlighting biases toward instrumental contexts and the need for robust, cloning-aware systems. The study provides open-source singer identification splits to benchmark progress and discusses future directions, including few-shot learning on cloned voices, to better combat voice deepfakes in music with practical implications for policy and platform decisions.
Abstract
Cloned voices of popular singers sound increasingly realistic and have gained popularity over the past few years. They however pose a threat to the industry due to personality rights concerns. As such, methods to identify the original singer in synthetic voices are needed. In this paper, we investigate how singer identification methods could be used for such a task. We present three embedding models that are trained using a singer-level contrastive learning scheme, where positive pairs consist of segments with vocals from the same singers. These segments can be mixtures for the first model, vocals for the second, and both for the third. We demonstrate that all three models are highly capable of identifying real singers. However, their performance deteriorates when classifying cloned versions of singers in our evaluation set. This is especially true for models that use mixtures as an input. These findings highlight the need to understand the biases that exist within singer identification systems, and how they can influence the identification of voice deepfakes in music.
