PiCoGen: Generate Piano Covers with a Two-stage Approach
Chih-Pin Tan, Shuen-Huei Guan, Yi-Hsuan Yang
TL;DR
PiCoGen tackles the challenge of automatic piano cover generation without requiring paired datasets of originals and covers. It proposes a two-stage framework where an audio-to-lead-sheet extractor is followed by a lead-sheet-to-piano-symbolic generator, formalized as $f = e \circ g$, and uses CP-like tokens to compactly represent both lead sheets and piano outputs. Compared with single-stage methods like Pop2Piano, PiCoGen achieves competitive or superior subjective quality across multiple genres, albeit with some gaps in objective melody accuracy and particularly Hip-hop due to transcription quality. The approach reduces data requirements, improves interpretability via the lead-sheet intermediate, and opens possibilities for hybrid data strategies that combine unpaired and paired data.
Abstract
Cover song generation stands out as a popular way of music making in the music-creative community. In this study, we introduce Piano Cover Generation (PiCoGen), a two-stage approach for automatic cover song generation that transcribes the melody line and chord progression of a song given its audio recording, and then uses the resulting lead sheet as the condition to generate a piano cover in the symbolic domain. This approach is advantageous in that it does not required paired data of covers and their original songs for training. Compared to an existing approach that demands such paired data, our evaluation shows that PiCoGen demonstrates competitive or even superior performance across songs of different musical genres.
