Table of Contents
Fetching ...

Iterative Feature Boosting for Explainable Speech Emotion Recognition

Alaa Nfissi, Wassim Bouachir, Nizar Bouguila, Brian Mishara

TL;DR

This work tackles high-dimensionality in speech emotion recognition by introducing an iterative, explainable feature boosting framework. It combines a feature boosting module, a supervised classifier, and a SHAP-based explainability module to iteratively refine feature sets, reporting state-of-the-art performance on the TESS dataset with an Extra Trees classifier achieving up to 98.7% accuracy. The approach emphasizes transparency, using SHAP to identify and justify feature contributions and to guide successive feature selections. While showing strong results on TESS, the authors acknowledge the need for validation on additional datasets and propose extending the framework to deep learning in future work, aiming for robust, interpretable SER applicable to real-world conditions.

Abstract

In speech emotion recognition (SER), using predefined features without considering their practical importance may lead to high dimensional datasets, including redundant and irrelevant information. Consequently, high-dimensional learning often results in decreasing model accuracy while increasing computational complexity. Our work underlines the importance of carefully considering and analyzing features in order to build efficient SER systems. We present a new supervised SER method based on an efficient feature engineering approach. We pay particular attention to the explainability of results to evaluate feature relevance and refine feature sets. This is performed iteratively through feature evaluation loop, using Shapley values to boost feature selection and improve overall framework performance. Our approach allows thus to balance the benefits between model performance and transparency. The proposed method outperforms human-level performance (HLP) and state-of-the-art machine learning methods in emotion recognition on the TESS dataset. The source code of this paper is publicly available at https://github.com/alaaNfissi/Iterative-Feature-Boosting-for-Explainable-Speech-Emotion-Recognition.

Iterative Feature Boosting for Explainable Speech Emotion Recognition

TL;DR

This work tackles high-dimensionality in speech emotion recognition by introducing an iterative, explainable feature boosting framework. It combines a feature boosting module, a supervised classifier, and a SHAP-based explainability module to iteratively refine feature sets, reporting state-of-the-art performance on the TESS dataset with an Extra Trees classifier achieving up to 98.7% accuracy. The approach emphasizes transparency, using SHAP to identify and justify feature contributions and to guide successive feature selections. While showing strong results on TESS, the authors acknowledge the need for validation on additional datasets and propose extending the framework to deep learning in future work, aiming for robust, interpretable SER applicable to real-world conditions.

Abstract

In speech emotion recognition (SER), using predefined features without considering their practical importance may lead to high dimensional datasets, including redundant and irrelevant information. Consequently, high-dimensional learning often results in decreasing model accuracy while increasing computational complexity. Our work underlines the importance of carefully considering and analyzing features in order to build efficient SER systems. We present a new supervised SER method based on an efficient feature engineering approach. We pay particular attention to the explainability of results to evaluate feature relevance and refine feature sets. This is performed iteratively through feature evaluation loop, using Shapley values to boost feature selection and improve overall framework performance. Our approach allows thus to balance the benefits between model performance and transparency. The proposed method outperforms human-level performance (HLP) and state-of-the-art machine learning methods in emotion recognition on the TESS dataset. The source code of this paper is publicly available at https://github.com/alaaNfissi/Iterative-Feature-Boosting-for-Explainable-Speech-Emotion-Recognition.
Paper Structure (13 sections, 5 equations, 5 figures, 3 tables)

This paper contains 13 sections, 5 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: The proposed method
  • Figure 2: Boosted features importance
  • Figure 3: Initial features importance
  • Figure 4: Biplot of TESS optimal feature combination
  • Figure 5: Confusion matrices for the Extra Trees classifier. Values between "( )": without feature boosting and model explainability. Values between "[ ]": our framework with feature boosting and model explainability