Table of Contents
Fetching ...

Exploring and Applying Audio-Based Sentiment Analysis in Music

Etash Jhanji

TL;DR

This study seeks to predict the emotion of a musical clip over time and determine the next emotion value after the music in a time series to ensure seamless transitions and trains models for both tasks.

Abstract

Sentiment analysis is a continuously explored area of text processing that deals with the computational analysis of opinions, sentiments, and subjectivity of text. However, this idea is not limited to text and speech, in fact, it could be applied to other modalities. In reality, humans do not express themselves in text as deeply as they do in music. The ability of a computational model to interpret musical emotions is largely unexplored and could have implications and uses in therapy and musical queuing. In this paper, two individual tasks are addressed. This study seeks to (1) predict the emotion of a musical clip over time and (2) determine the next emotion value after the music in a time series to ensure seamless transitions. Utilizing data from the Emotions in Music Database, which contains clips of songs selected from the Free Music Archive annotated with levels of valence and arousal as reported on Russel's circumplex model of affect by multiple volunteers, models are trained for both tasks. Overall, the performance of these models reflected that they were able to perform the tasks they were designed for effectively and accurately.

Exploring and Applying Audio-Based Sentiment Analysis in Music

TL;DR

This study seeks to predict the emotion of a musical clip over time and determine the next emotion value after the music in a time series to ensure seamless transitions and trains models for both tasks.

Abstract

Sentiment analysis is a continuously explored area of text processing that deals with the computational analysis of opinions, sentiments, and subjectivity of text. However, this idea is not limited to text and speech, in fact, it could be applied to other modalities. In reality, humans do not express themselves in text as deeply as they do in music. The ability of a computational model to interpret musical emotions is largely unexplored and could have implications and uses in therapy and musical queuing. In this paper, two individual tasks are addressed. This study seeks to (1) predict the emotion of a musical clip over time and (2) determine the next emotion value after the music in a time series to ensure seamless transitions. Utilizing data from the Emotions in Music Database, which contains clips of songs selected from the Free Music Archive annotated with levels of valence and arousal as reported on Russel's circumplex model of affect by multiple volunteers, models are trained for both tasks. Overall, the performance of these models reflected that they were able to perform the tasks they were designed for effectively and accurately.
Paper Structure (18 sections, 7 figures, 2 tables)

This paper contains 18 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Russel's Circumplex Model of Affect
  • Figure 2: The full pipeline of audio processing including clipping, mel spectrogram, and storage format.
  • Figure 3: Loss graphs (MSE) for training and validation varied with number of epochs for task 1 using the most optimal hyperparameter found. Shows convergence for training but slight possible overfitting.
  • Figure 4: Loss graphs (MSE) for training and validation varied with number of epochs for task 2 using the most optimal hyperparameter found. The loss converges very quickly after the first two epochs.
  • Figure 5: Arousal and valence for one song plotted against time on the z-axis. Dotted lines represent the centerlines and edges of the circumplex model.
  • ...and 2 more figures