Table of Contents
Fetching ...

Spectrogram-Based Detection of Auto-Tuned Vocals in Music Recordings

Mahyar Gohari, Paolo Bestagini, Sergio Benini, Nicola Adami

TL;DR

This study introduces a data-driven approach leveraging triplet networks for the detection of Auto-Tuned songs, backed by the creation of a dataset composed of original and Auto-Tuned audio clips, demonstrating the superiority of the proposed method in both accuracy and robustness when compared to two baseline models.

Abstract

In the domain of music production and audio processing, the implementation of automatic pitch correction of the singing voice, also known as Auto-Tune, has significantly transformed the landscape of vocal performance. While auto-tuning technology has offered musicians the ability to tune their vocal pitches and achieve a desired level of precision, its use has also sparked debates regarding its impact on authenticity and artistic integrity. As a result, detecting and analyzing Auto-Tuned vocals in music recordings has become essential for music scholars, producers, and listeners. However, to the best of our knowledge, no prior effort has been made in this direction. This study introduces a data-driven approach leveraging triplet networks for the detection of Auto-Tuned songs, backed by the creation of a dataset composed of original and Auto-Tuned audio clips. The experimental results demonstrate the superiority of the proposed method in both accuracy and robustness compared to Rawnet2, an end-to-end model proposed for anti-spoofing and widely used for other audio forensic tasks.

Spectrogram-Based Detection of Auto-Tuned Vocals in Music Recordings

TL;DR

This study introduces a data-driven approach leveraging triplet networks for the detection of Auto-Tuned songs, backed by the creation of a dataset composed of original and Auto-Tuned audio clips, demonstrating the superiority of the proposed method in both accuracy and robustness when compared to two baseline models.

Abstract

In the domain of music production and audio processing, the implementation of automatic pitch correction of the singing voice, also known as Auto-Tune, has significantly transformed the landscape of vocal performance. While auto-tuning technology has offered musicians the ability to tune their vocal pitches and achieve a desired level of precision, its use has also sparked debates regarding its impact on authenticity and artistic integrity. As a result, detecting and analyzing Auto-Tuned vocals in music recordings has become essential for music scholars, producers, and listeners. However, to the best of our knowledge, no prior effort has been made in this direction. This study introduces a data-driven approach leveraging triplet networks for the detection of Auto-Tuned songs, backed by the creation of a dataset composed of original and Auto-Tuned audio clips. The experimental results demonstrate the superiority of the proposed method in both accuracy and robustness compared to Rawnet2, an end-to-end model proposed for anti-spoofing and widely used for other audio forensic tasks.
Paper Structure (15 sections, 4 figures, 2 tables)

This paper contains 15 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Pipeline of the proposed method.
  • Figure 2: The spectrogram of a 10-second vocal (a) and its corresponding Auto-Tuned version (b).
  • Figure 3: The schematic of dataset creation pipelines: $\mathcal{D}_1$ derived from VocalSet (a), $\mathcal{D}_2$ and $\mathcal{D}_3$ derived from Musdb18 (b), and the test dataset ($\mathcal{D}_4$) creation process (c).
  • Figure 4: Song-level performance curves of the models across varying thresholds, measured in terms of accuracy (a), precision (b), and recall (c).