AMT-APC: Automatic Piano Cover by Fine-Tuning an Automatic Music Transcription Model
Kazuma Komiya, Yoshihisa Fukuhara
TL;DR
The paper addresses the challenge of expressive and faithful automatic piano covers by proposing AMT-APC, a two-stage framework that fine-tunes a pre-trained Automatic Music Transcription (AMT) model for APC. It uses a base hFT-Transformer AMT model, augmented with a continuous 24-dimensional style vector to control performance style, and trains with a masked cross-entropy loss across onsets, frames, and velocities. On a dataset of 332 songs with 1,267 piano covers, AMT-APC achieves a lower $Q_{\max}$ (0.035) than baselines, and ablation shows the benefits of both AMT pre-training and the style vector. The results illustrate a strong link between AMT and APC tasks and point to future improvements via AMT architectures optimized for APC.
Abstract
There have been several studies on automatically generating piano covers, and recent advancements in deep learning have enabled the creation of more sophisticated covers. However, existing automatic piano cover models still have room for improvement in terms of expressiveness and fidelity to the original. To address these issues, we propose a learning algorithm called AMT-APC, which leverages the capabilities of automatic music transcription models. By utilizing the strengths of well-established automatic music transcription models, we aim to improve the accuracy of piano cover generation. Our experiments demonstrate that the AMT-APC model reproduces original tracks more accurately than any existing models.
