Table of Contents
Fetching ...

Feature Selection via Graph Topology Inference for Soundscape Emotion Recognition

Samuel Rey, Luca Martino, Roberto San Millan, Eduardo Morgado

Abstract

Research on soundscapes has shifted the focus of environmental acoustics from noise levels to the perception of sounds, incorporating contextual factors. Soundscape emotion recognition (SER) models perception using a set of features, with arousal and valence commonly regarded as sufficient descriptors of affect. In this work, we blend \emph{graph learning} techniques with a novel \emph{information criterion} to develop a feature selection framework for SER. Specifically, we estimate a sparse graph representation of feature relations using linear structural equation models (SEM) tailored to the widely used Emo-Soundscapes dataset. The resulting graph captures the relations between input features and the two emotional outputs. To determine the appropriate level of sparsity, we propose a novel \emph{generalized elbow detector}, which provides both a point estimate and an uncertainty interval. We conduct an extensive evaluation of our methods, including visualizations of the inferred relations. While several of our findings align with previous studies, the graph representation also reveals a strong connection between arousal and valence, challenging common SER assumptions.

Feature Selection via Graph Topology Inference for Soundscape Emotion Recognition

Abstract

Research on soundscapes has shifted the focus of environmental acoustics from noise levels to the perception of sounds, incorporating contextual factors. Soundscape emotion recognition (SER) models perception using a set of features, with arousal and valence commonly regarded as sufficient descriptors of affect. In this work, we blend \emph{graph learning} techniques with a novel \emph{information criterion} to develop a feature selection framework for SER. Specifically, we estimate a sparse graph representation of feature relations using linear structural equation models (SEM) tailored to the widely used Emo-Soundscapes dataset. The resulting graph captures the relations between input features and the two emotional outputs. To determine the appropriate level of sparsity, we propose a novel \emph{generalized elbow detector}, which provides both a point estimate and an uncertainty interval. We conduct an extensive evaluation of our methods, including visualizations of the inferred relations. While several of our findings align with previous studies, the graph representation also reveals a strong connection between arousal and valence, challenging common SER assumptions.

Paper Structure

This paper contains 16 sections, 15 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: Circle of emotion with Arousal and Valence in two cardinal positions salmeron2012fuzzy.
  • Figure 2: Example of error curve $V(z)$ (blue solid line) and the construction of the areas $A_1$, $A_2$, $A_3$, with three straight lines, with $k_1<k_2$. The curve is sampled in $K=6$ points at $z_0$, $z_1$..., $z_K$, depicted with blue circles. For simplicity, we have consider $z_0=0$ and $V(z_K)=0$.
  • Figure 3: The ground-truth intervals $[\lambda_1^*,\lambda_2^*]$ in 100 different realizations are depicted with pink shaded areas. Within these intervals, the $\lambda$ values yield exactly $10$ correctly located zeros in ${\bm \beta}^*_\lambda$ in configuration C1, and $20$ correctly located zeros in ${\bm \beta}^*_\lambda$ in configuration C2. The intervals provided by G-UAED are shown by the yellow shaded areas in (a) for configuration C1 and in (b) for configuration C2. In configuration C1, only 3 runs (highlighted in red) contain a small portion of the G-UAED interval outside the set of suitable values. In configuration C2, the G-UAED intervals always contain suitable values.
  • Figure 4: The normalized error as function of the number of links in the graph, in the 4 different scenarios. Note that the points with zero links correspond to $\lambda_{\texttt{max}}$. The red points and red lines in each figure correspond to the elbows and intervals obtained by applying the G-UAED.
  • Figure 5: Graph representation and associated adjacency matrix of edges involving the output variables (valence and arousal). Connectivity between output variables is allowed. (a) Using the elbow value $\lambda=159.78$; (b) $\lambda=300$; (c) $\lambda=360$; (d) Using the largest value of $\lambda=472.59$ within our interval.
  • ...and 4 more figures