Table of Contents
Fetching ...

EMelodyGen: Emotion-Conditioned Melody Generation in ABC Notation with the Musical Feature Template

Monan Zhou, Xiaobing Li, Feng Yu, Wei Li

TL;DR

This work tackles emotion-conditioned melody generation in ABC notation by designing a musical feature template that maps emotional control to a set of controllable and embedded features, guided by correlations from emotion-labeled datasets and music psychology. To overcome data scarcity, it auto-labels well-structured scores to create Rough4Q, which, when used to fine-tune a Tunesformer backbone, achieves high parsing reliability ($\text{music21 parsing rate} \approx 99\%$) and strong alignment with human emotion perception ($\approx 91\%$ in blind tests). Ablation studies show that the five features—mode, tempo, pitchSD, RMS, and octave control—collectively drive emotional expression, with tempo, pitchSD, and mode being particularly impactful. The approach demonstrates that template-based emotion control, combined with strategic data augmentation and embedding, is a viable path for reliable emotion-conditioned melody generation in ABC notation, with practical implications for expressive symbolic music generation.

Abstract

The EMelodyGen system focuses on emotional melody generation in ABC notation controlled by the musical feature template. Owing to the scarcity of well-structured and emotionally labeled sheet music, we designed a template for controlling emotional melody generation by statistical correlations between musical features and emotion labels derived from small-scale emotional symbolic music datasets and music psychology conclusions. We then automatically annotated a large, well-structured sheet music collection with rough emotional labels by the template, converted them into ABC notation, and reduced label imbalance by data augmentation, resulting in a dataset named Rough4Q. Our system backbone pre-trained on Rough4Q can achieve up to 99% music21 parsing rate and melodies generated by our template can lead to a 91% alignment on emotional expressions in blind listening tests. Ablation studies further validated the effectiveness of the feature controls in the template. Available code and demos are at https://github.com/monetjoe/EMelodyGen.

EMelodyGen: Emotion-Conditioned Melody Generation in ABC Notation with the Musical Feature Template

TL;DR

This work tackles emotion-conditioned melody generation in ABC notation by designing a musical feature template that maps emotional control to a set of controllable and embedded features, guided by correlations from emotion-labeled datasets and music psychology. To overcome data scarcity, it auto-labels well-structured scores to create Rough4Q, which, when used to fine-tune a Tunesformer backbone, achieves high parsing reliability () and strong alignment with human emotion perception ( in blind tests). Ablation studies show that the five features—mode, tempo, pitchSD, RMS, and octave control—collectively drive emotional expression, with tempo, pitchSD, and mode being particularly impactful. The approach demonstrates that template-based emotion control, combined with strategic data augmentation and embedding, is a viable path for reliable emotion-conditioned melody generation in ABC notation, with practical implications for expressive symbolic music generation.

Abstract

The EMelodyGen system focuses on emotional melody generation in ABC notation controlled by the musical feature template. Owing to the scarcity of well-structured and emotionally labeled sheet music, we designed a template for controlling emotional melody generation by statistical correlations between musical features and emotion labels derived from small-scale emotional symbolic music datasets and music psychology conclusions. We then automatically annotated a large, well-structured sheet music collection with rough emotional labels by the template, converted them into ABC notation, and reduced label imbalance by data augmentation, resulting in a dataset named Rough4Q. Our system backbone pre-trained on Rough4Q can achieve up to 99% music21 parsing rate and melodies generated by our template can lead to a 91% alignment on emotional expressions in blind listening tests. Ablation studies further validated the effectiveness of the feature controls in the template. Available code and demos are at https://github.com/monetjoe/EMelodyGen.
Paper Structure (12 sections, 4 equations, 4 figures, 4 tables)

This paper contains 12 sections, 4 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Pie charts with proportions on different emotion categories of the processed datasets.
  • Figure 2: Gaussian KDE charts for Russell 4Q emotions over the six music-related features: key, tempo, average pitch, pitch range, pitchSD, and RMS, respectively (the six subplots on the left side); bar charts of Russell 4Q emotion frequency over modes and directions (the two subplots on the right side).
  • Figure 3: The overall system architecture with training and inference branches of the backbone, whose below part outlines the musical features currently in use for control.
  • Figure 4: Confusion matrices between human blind listening emotions of generated melodies and emotion prompts under full control and ablation options, where vertical axes represent the emotion prompts, and the horizontal axes represent the emotions labeled by the participants.