Table of Contents
Fetching ...

PiCoGen: Generate Piano Covers with a Two-stage Approach

Chih-Pin Tan, Shuen-Huei Guan, Yi-Hsuan Yang

TL;DR

PiCoGen tackles the challenge of automatic piano cover generation without requiring paired datasets of originals and covers. It proposes a two-stage framework where an audio-to-lead-sheet extractor is followed by a lead-sheet-to-piano-symbolic generator, formalized as $f = e \circ g$, and uses CP-like tokens to compactly represent both lead sheets and piano outputs. Compared with single-stage methods like Pop2Piano, PiCoGen achieves competitive or superior subjective quality across multiple genres, albeit with some gaps in objective melody accuracy and particularly Hip-hop due to transcription quality. The approach reduces data requirements, improves interpretability via the lead-sheet intermediate, and opens possibilities for hybrid data strategies that combine unpaired and paired data.

Abstract

Cover song generation stands out as a popular way of music making in the music-creative community. In this study, we introduce Piano Cover Generation (PiCoGen), a two-stage approach for automatic cover song generation that transcribes the melody line and chord progression of a song given its audio recording, and then uses the resulting lead sheet as the condition to generate a piano cover in the symbolic domain. This approach is advantageous in that it does not required paired data of covers and their original songs for training. Compared to an existing approach that demands such paired data, our evaluation shows that PiCoGen demonstrates competitive or even superior performance across songs of different musical genres.

PiCoGen: Generate Piano Covers with a Two-stage Approach

TL;DR

PiCoGen tackles the challenge of automatic piano cover generation without requiring paired datasets of originals and covers. It proposes a two-stage framework where an audio-to-lead-sheet extractor is followed by a lead-sheet-to-piano-symbolic generator, formalized as , and uses CP-like tokens to compactly represent both lead sheets and piano outputs. Compared with single-stage methods like Pop2Piano, PiCoGen achieves competitive or superior subjective quality across multiple genres, albeit with some gaps in objective melody accuracy and particularly Hip-hop due to transcription quality. The approach reduces data requirements, improves interpretability via the lead-sheet intermediate, and opens possibilities for hybrid data strategies that combine unpaired and paired data.

Abstract

Cover song generation stands out as a popular way of music making in the music-creative community. In this study, we introduce Piano Cover Generation (PiCoGen), a two-stage approach for automatic cover song generation that transcribes the melody line and chord progression of a song given its audio recording, and then uses the resulting lead sheet as the condition to generate a piano cover in the symbolic domain. This approach is advantageous in that it does not required paired data of covers and their original songs for training. Compared to an existing approach that demands such paired data, our evaluation shows that PiCoGen demonstrates competitive or even superior performance across songs of different musical genres.
Paper Structure (12 sections, 3 figures, 1 table)

This paper contains 12 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: An overview of PiCoGen. The model generates a piano cover in two stages: extracts firstly a lead sheet (i.e., melody line and chord progression) from an audio recording of the original song via audio analysis (i.e., transcription), and then turns the extracted lead sheet into a piano performance via conditional symbolic-domain music generation.
  • Figure 2: For each bar (musical measure) $k$ of the input, the Extractor transcribes from the input its lead sheet $L^k$ (a token sequence), and the Performer generates autoregressively the piano performance $S^k$ (also a token sequence) for the same bar given the current and preceding sequences of lead sheet $[L^1, L^2, \dots L^k]$ and the preceding piano performances $[S^1, S^2, \dots S^{k-1}]$ organized in an interleaving fashion.
  • Figure 3: Mean opinion score in the metric OVL in the subjective evaluation, for each of the ten considered genres.