SEGAA: A Unified Approach to Predicting Age, Gender, and Emotion in Speech

Aron R; Indra Sigicharla; Chirag Periwal; Mohanaprasad K; Nithya Darisini P S; Sourabh Tiwari; Shivani Arora

SEGAA: A Unified Approach to Predicting Age, Gender, and Emotion in Speech

Aron R, Indra Sigicharla, Chirag Periwal, Mohanaprasad K, Nithya Darisini P S, Sourabh Tiwari, Shivani Arora

TL;DR

This work tackles the challenge of predicting age, gender, and emotion from speech using a unified multi-output SEGAA architecture. By merging CREMA-D and EMO-DB to obtain triple-labeled data and applying extensive feature extraction with data augmentation, the authors compare univariate, multi-output, and sequential approaches. Results show that multi-output SEGAA achieves performance close to independent models while offering runtime efficiency, and sequential cascades tend to propagate errors. The study provides evidence that leveraging interdependencies among vocal attributes can yield robust, efficient predictions suitable for real-time applications in diverse domains.

Abstract

The interpretation of human voices holds importance across various applications. This study ventures into predicting age, gender, and emotion from vocal cues, a field with vast applications. Voice analysis tech advancements span domains, from improving customer interactions to enhancing healthcare and retail experiences. Discerning emotions aids mental health, while age and gender detection are vital in various contexts. Exploring deep learning models for these predictions involves comparing single, multi-output, and sequential models highlighted in this paper. Sourcing suitable data posed challenges, resulting in the amalgamation of the CREMA-D and EMO-DB datasets. Prior work showed promise in individual predictions, but limited research considered all three variables simultaneously. This paper identifies flaws in an individual model approach and advocates for our novel multi-output learning architecture Speech-based Emotion Gender and Age Analysis (SEGAA) model. The experiments suggest that Multi-output models perform comparably to individual models, efficiently capturing the intricate relationships between variables and speech inputs, all while achieving improved runtime.

SEGAA: A Unified Approach to Predicting Age, Gender, and Emotion in Speech

TL;DR

Abstract

SEGAA: A Unified Approach to Predicting Age, Gender, and Emotion in Speech

Authors

TL;DR

Abstract

Table of Contents

Figures (9)