Color-based Emotion Representation for Speech Emotion Recognition

Ryotaro Nagase; Ryoichi Takashima; Yoichi Yamashita

Color-based Emotion Representation for Speech Emotion Recognition

Ryotaro Nagase, Ryoichi Takashima, Yoichi Yamashita

TL;DR

The paper tackles the limitations of traditional SER representations by introducing a color-based, continuous emotion representation that maps speech to hue, saturation, and value attributes. It builds a crowdsourced annotation pipeline to label color attributes on a Japanese acted speech corpus and demonstrates systematic relationships between color attributes and six categorical emotions. Regression experiments with SVR and DNN show color attributes can be predicted from speech, with HuBERT SSL features providing strong gains; a multitask learning setup further improves both color attribute regression and emotion classification, indicating complementary information across tasks. This framework enables intuitive visualization and potential improvements in practical applications like counseling and e-learning, with avenues for extension to spontaneous and non-Japanese speech in future work.

Abstract

Speech emotion recognition (SER) has traditionally relied on categorical or dimensional labels. However, this technique is limited in representing both the diversity and interpretability of emotions. To overcome this limitation, we focus on color attributes, such as hue, saturation, and value, to represent emotions as continuous and interpretable scores. We annotated an emotional speech corpus with color attributes via crowdsourcing and analyzed them. Moreover, we built regression models for color attributes in SER using machine learning and deep learning, and explored the multitask learning of color attribute regression and emotion classification. As a result, we demonstrated the relationship between color attributes and emotions in speech, and successfully developed color attribute regression models for SER. We also showed that multitask learning improved the performance of each task.

Color-based Emotion Representation for Speech Emotion Recognition

TL;DR

Abstract

Paper Structure (15 sections, 5 equations, 8 figures, 2 tables)

This paper contains 15 sections, 5 equations, 8 figures, 2 tables.

Introduction
Procedure for annotating emotions with color attributes
Collection and analysis of emotions with color attributes
Emotional speech dataset annotated with color attributes
Analysis of collected color attributes with categorical emotions
Hue
Saturation
Value
Color attribute regression for SER
Experimental setup
Dataset
Models and metrics
Experiment 1: Comparison of color attribute regression results
Experiment 2: Multitask learning of color attribute regression and emotion classification
Conclusion

Figures (8)

Figure 1: Interface for annotating color attributes
Figure 2: Histogram of hue label frequencies per emotion
Figure 3: Distribution of saturation label per emotion
Figure 4: Distribution of value label per emotion
Figure 5: Outline of the models used in Experiments 1 and 2
...and 3 more figures

Color-based Emotion Representation for Speech Emotion Recognition

TL;DR

Abstract

Color-based Emotion Representation for Speech Emotion Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (8)