Table of Contents
Fetching ...

Innovations in Cover Song Detection: A Lyrics-Based Approach

Maximilian Balluff, Peter Mandl, Christian Wolff

TL;DR

This work tackles cover song detection by shifting from audio-centric analysis to a lyrics-based approach. It introduces a large, annotated paired-lyrics dataset and trains a cross-lingual transformer with a triplet-loss Siamese framework to embed lyrics, enabling effective original–cover matching in embedding space. Empirical results show the proposed triplet model achieving $mAP=87.17\%$, $MR=18.51$, and $P@1=83.57\%$, outperforming the strong Bag-of-Words baseline ($mAP=85.74\%$, $MR=46.29$, $P@1=83.65\%$); however, runtime and annotation quality pose practical concerns. The paper highlights the viability of lyrics as a robust signal for CSD, while outlining limitations and future directions toward end-to-end, audio-lyric integrated systems and larger-scale, multilingual datasets.

Abstract

Cover songs are alternate versions of a song by a different artist. Long being a vital part of the music industry, cover songs significantly influence music culture and are commonly heard in public venues. The rise of online music platforms has further increased their prevalence, often as background music or video soundtracks. While current automatic identification methods serve adequately for original songs, they are less effective with cover songs, primarily because cover versions often significantly deviate from the original compositions. In this paper, we propose a novel method for cover song detection that utilizes the lyrics of a song. We introduce a new dataset for cover songs and their corresponding originals. The dataset contains 5078 cover songs and 2828 original songs. In contrast to other cover song datasets, it contains the annotated lyrics for the original song and the cover song. We evaluate our method on this dataset and compare it with multiple baseline approaches. Our results show that our method outperforms the baseline approaches.

Innovations in Cover Song Detection: A Lyrics-Based Approach

TL;DR

This work tackles cover song detection by shifting from audio-centric analysis to a lyrics-based approach. It introduces a large, annotated paired-lyrics dataset and trains a cross-lingual transformer with a triplet-loss Siamese framework to embed lyrics, enabling effective original–cover matching in embedding space. Empirical results show the proposed triplet model achieving , , and , outperforming the strong Bag-of-Words baseline (, , ); however, runtime and annotation quality pose practical concerns. The paper highlights the viability of lyrics as a robust signal for CSD, while outlining limitations and future directions toward end-to-end, audio-lyric integrated systems and larger-scale, multilingual datasets.

Abstract

Cover songs are alternate versions of a song by a different artist. Long being a vital part of the music industry, cover songs significantly influence music culture and are commonly heard in public venues. The rise of online music platforms has further increased their prevalence, often as background music or video soundtracks. While current automatic identification methods serve adequately for original songs, they are less effective with cover songs, primarily because cover versions often significantly deviate from the original compositions. In this paper, we propose a novel method for cover song detection that utilizes the lyrics of a song. We introduce a new dataset for cover songs and their corresponding originals. The dataset contains 5078 cover songs and 2828 original songs. In contrast to other cover song datasets, it contains the annotated lyrics for the original song and the cover song. We evaluate our method on this dataset and compare it with multiple baseline approaches. Our results show that our method outperforms the baseline approaches.
Paper Structure (10 sections, 2 figures, 2 tables)

This paper contains 10 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Levenshtein distance between (upper) and WER (lower) between cover and original songs, and between cover and other songs
  • Figure 2: Model architecture