Emotion Manipulation Through Music -- A Deep Learning Interactive Visual Approach
Adel N. Abdalla, Jared Osborne, Razvan Andonie
TL;DR
This paper introduces Semantic Manipulation of Music (SMM), an end-to-end pipeline that alters the emotional content of existing music toward a diametrically opposite state while preserving the original melody. It combines MIDI-to-audio synthesis, key transposition via transposition tools, Accomontage2 accompaniment generation, and a deep learning XLSR-Wav2Vec2-based classifier to map audio into Russell's Circumplex quadrants, visualizing results on a 2D emotion plane. The classifier achieves about 70% accuracy on the 4Q Emotional Dataset (near state-of-the-art) and demonstrates robustness on the DEAM dataset, while qualitative results show that both key and timbre (SoundFont) significantly influence emotional shifts, with the Mario SoundFont notably boosting happiness. The work lays groundwork for on-demand remixing and emotion-directed music curation, while highlighting ethical and copyright considerations and proposing avenues for more sophisticated transformations and human-in-the-loop validation.
Abstract
Music evokes emotion in many people. We introduce a novel way to manipulate the emotional content of a song using AI tools. Our goal is to achieve the desired emotion while leaving the original melody as intact as possible. For this, we create an interactive pipeline capable of shifting an input song into a diametrically opposed emotion and visualize this result through Russel's Circumplex model. Our approach is a proof-of-concept for Semantic Manipulation of Music, a novel field aimed at modifying the emotional content of existing music. We design a deep learning model able to assess the accuracy of our modifications to key, SoundFont instrumentation, and other musical features. The accuracy of our model is in-line with the current state of the art techniques on the 4Q Emotion dataset. With further refinement, this research may contribute to on-demand custom music generation, the automated remixing of existing work, and music playlists tuned for emotional progression.
