Table of Contents
Fetching ...

REMAST: Real-time Emotion-based Music Arrangement with Soft Transition

Zihao Wang, Le Ma, Chen Zhang, Bo Han, Yunfei Xu, Yikai Wang, Xinyi Chen, HaoRong Hong, Wenbo Liu, Xinda Wu, Kejun Zhang

TL;DR

REMAST tackles the challenge of real-time emotion-based music arrangement by recognizing the previous segment's emotion and fusing it with the current target emotion to condition a Transformer-based generator. It introduces a downsampling arrangement pipeline and four music-theory features to enrich emotional information, and employs semi-supervised learning to leverage unlabeled data. Through objective and subjective evaluations, REMAST outperforms baselines in music coherence and similarity while maintaining strong real-time emotion fit, and demonstrates potential for anxiety-relief applications. The approach enables smooth emotional transitions in real-time music and offers practical benefits for therapy, gaming, and media scoring.

Abstract

Music as an emotional intervention medium has important applications in scenarios such as music therapy, games, and movies. However, music needs real-time arrangement according to changing emotions, bringing challenges to balance emotion real-time fit and soft emotion transition due to the fine-grained and mutable nature of the target emotion. Existing studies mainly focus on achieving emotion real-time fit, while the issue of smooth transition remains understudied, affecting the overall emotional coherence of the music. In this paper, we propose REMAST to address this trade-off. Specifically, we recognize the last timestep's music emotion and fuse it with the current timestep's input emotion. The fused emotion then guides REMAST to generate the music based on the input melody. To adjust music similarity and emotion real-time fit flexibly, we downsample the original melody and feed it into the generation model. Furthermore, we design four music theory features by domain knowledge to enhance emotion information and employ semi-supervised learning to mitigate the subjective bias introduced by manual dataset annotation. According to the evaluation results, REMAST surpasses the state-of-the-art methods in objective and subjective metrics. These results demonstrate that REMAST achieves real-time fit and smooth transition simultaneously, enhancing the coherence of the generated music.

REMAST: Real-time Emotion-based Music Arrangement with Soft Transition

TL;DR

REMAST tackles the challenge of real-time emotion-based music arrangement by recognizing the previous segment's emotion and fusing it with the current target emotion to condition a Transformer-based generator. It introduces a downsampling arrangement pipeline and four music-theory features to enrich emotional information, and employs semi-supervised learning to leverage unlabeled data. Through objective and subjective evaluations, REMAST outperforms baselines in music coherence and similarity while maintaining strong real-time emotion fit, and demonstrates potential for anxiety-relief applications. The approach enables smooth emotional transitions in real-time music and offers practical benefits for therapy, gaming, and media scoring.

Abstract

Music as an emotional intervention medium has important applications in scenarios such as music therapy, games, and movies. However, music needs real-time arrangement according to changing emotions, bringing challenges to balance emotion real-time fit and soft emotion transition due to the fine-grained and mutable nature of the target emotion. Existing studies mainly focus on achieving emotion real-time fit, while the issue of smooth transition remains understudied, affecting the overall emotional coherence of the music. In this paper, we propose REMAST to address this trade-off. Specifically, we recognize the last timestep's music emotion and fuse it with the current timestep's input emotion. The fused emotion then guides REMAST to generate the music based on the input melody. To adjust music similarity and emotion real-time fit flexibly, we downsample the original melody and feed it into the generation model. Furthermore, we design four music theory features by domain knowledge to enhance emotion information and employ semi-supervised learning to mitigate the subjective bias introduced by manual dataset annotation. According to the evaluation results, REMAST surpasses the state-of-the-art methods in objective and subjective metrics. These results demonstrate that REMAST achieves real-time fit and smooth transition simultaneously, enhancing the coherence of the generated music.
Paper Structure (28 sections, 4 equations, 12 figures, 8 tables)

This paper contains 28 sections, 4 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: The workflow of utilizing REMAST in various scenarios. The user selects the original music at the beginning, and REMAST obtains the real-time emotion.
  • Figure 2: The overall architecture for REMAST. 1) In the recognition phase, REMAST recognizes the emotion of the last timestep's music segment. 2) In the generation phase, REMAST fuses the last timestep's recognized music emotion with the current timestep's target input emotion and generates the current timestep’s music segment based on the fused emotion.
  • Figure 3: The structure of the music emotion recognition model, which outputs the emotion sequence from the input music content and the music theory features.
  • Figure 4: The circle of fifths serves as the foundation for Harmonic Color, quantifying chord freshness by assigning each note a position and value.
  • Figure 5: Composition of the Contour Factor, with elements including extremum, trend, concave-convex property.
  • ...and 7 more figures