PianoBART: Symbolic Piano Music Generation and Understanding with Large-Scale Pre-Training
Xiao Liang, Zijian Zhao, Weichao Zeng, Yutong He, Fupeng He, Yiyi Wang, Chengying Gao
TL;DR
PianoBART addresses the challenge of jointly learning symbolic piano music generation and understanding in the absence of abundant labeled data. It introduces a BART-based, encoder-decoder framework that encodes symbolic music as octuple tokens and trains with a multi-level object selection strategy to prevent information leakage and capture long-range musical structure. The key contributions include the octuple representation, six pre-training object-selection methods across token/element and time-span levels, and strong empirical results showing coherent long-form generation and robust music understanding across multiple datasets and tasks. This approach enables scalable, unified modeling of symbolic music with potential impact on automated composition, music analysis, and downstream MIR tasks.
Abstract
Learning musical structures and composition patterns is necessary for both music generation and understanding, but current methods do not make uniform use of learned features to generate and comprehend music simultaneously. In this paper, we propose PianoBART, a pre-trained model that uses BART for both symbolic piano music generation and understanding. We devise a multi-level object selection strategy for different pre-training tasks of PianoBART, which can prevent information leakage or loss and enhance learning ability. The musical semantics captured in pre-training are fine-tuned for music generation and understanding tasks. Experiments demonstrate that PianoBART efficiently learns musical patterns and achieves outstanding performance in generating high-quality coherent pieces and comprehending music. Our code and supplementary material are available at https://github.com/RS2002/PianoBart.
