MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing

Shangda Wu; Yashan Wang; Xiaobing Li; Feng Yu; Maosong Sun

MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing

Shangda Wu, Yashan Wang, Xiaobing Li, Feng Yu, Maosong Sun

TL;DR

MelodyT5 tackles data scarcity and task fragmentation in symbolic music by unifying seven melody-centric tasks as score-to-score transformations within a Transformer encoder-decoder framework. It introduces bar patching for ABC notation and a multi-layer architecture that includes a patch-level encoder/decoder and a character-level decoder, all pre-trained on MelodyHub to enable effective multi-task transfer learning. The dataset provides over 1{,}067{,}747 task instances across diverse tasks, enabling robust pre-training and evaluation. Experiments show MelodyT5 outperforms task-specific baselines on most tasks, with both objective gains (e.g., reduced BPB and improved CTRL, CTnCTR, PCS, MCTD, F1) and positive subjective feedback, highlighting the value of unified score-to-score modeling in symbolic music processing and offering a comprehensive resource for future work.

Abstract

In the domain of symbolic music research, the progress of developing scalable systems has been notably hindered by the scarcity of available training data and the demand for models tailored to specific tasks. To address these issues, we propose MelodyT5, a novel unified framework that leverages an encoder-decoder architecture tailored for symbolic music processing in ABC notation. This framework challenges the conventional task-specific approach, considering various symbolic music tasks as score-to-score transformations. Consequently, it integrates seven melody-centric tasks, from generation to harmonization and segmentation, within a single model. Pre-trained on MelodyHub, a newly curated collection featuring over 261K unique melodies encoded in ABC notation and encompassing more than one million task instances, MelodyT5 demonstrates superior performance in symbolic music processing via multi-task transfer learning. Our findings highlight the efficacy of multi-task transfer learning in symbolic music processing, particularly for data-scarce tasks, challenging the prevailing task-specific paradigms and offering a comprehensive dataset and framework for future explorations in this domain.

MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing

TL;DR

Abstract

Paper Structure (15 sections, 1 equation, 3 figures, 3 tables)

This paper contains 15 sections, 1 equation, 3 figures, 3 tables.

Introduction
Methodology
Data Representation
Model Architecture
Pre-training Objective
Dataset
Melody Curation
Task Definition
Experiments
Settings
Ablation Studies
Comparative Evaluations
Conclusions
Ethics Statement
Acknowlegdements

Figures (3)

Figure 1: The MelodyT5 framework employs a Transformer encoder-decoder architecture with bar patching for music processing. It uses linear projection of input bar patches, fed into a patch-level Transformer encoder. The encoder output provides context for a patch-level Transformer decoder to autoregressively produce target bar features. A character-level Transformer decoder then uses these features to generate detailed characters for each bar, forming the target musical score.
Figure 2: Comparative subjective evaluation of MelodyT5 against task-specific baselines in symbolic music tasks, showing vote counts for each model.
Figure :

MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing

TL;DR

Abstract

MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing

Authors

TL;DR

Abstract

Table of Contents

Figures (3)