Emotion Recognition and Generation: A Comprehensive Review of Face, Speech, and Text Modalities

Rebecca Mobbs; Dimitrios Makris; Vasileios Argyriou

Emotion Recognition and Generation: A Comprehensive Review of Face, Speech, and Text Modalities

Rebecca Mobbs, Dimitrios Makris, Vasileios Argyriou

TL;DR

The survey addresses the problem of understanding and generating human emotions across facial, vocal, and textual modalities. It surveys state-of-the-art methods in emotion recognition and generation, detailing preprocessing, datasets, architectures, and evaluation metrics, with emphasis on cross-modal integration and controllable generation. Key contributions include a comprehensive taxonomy of FER, SER, TSR, FEG, SEG, and TSG approaches, an analysis of evaluation frameworks, and a discussion of challenges such as data bias and ethical considerations. The work highlights the practical significance of robust, multimodal, and ethically responsible emotion-aware AI for applications in healthcare, customer service, and interactive agents, and outlines future directions including standardized benchmarks and multimodal fusion strategies.

Abstract

Emotion recognition and generation have emerged as crucial topics in Artificial Intelligence research, playing a significant role in enhancing human-computer interaction within healthcare, customer service, and other fields. Although several reviews have been conducted on emotion recognition and generation as separate entities, many of these works are either fragmented or limited to specific methodologies, lacking a comprehensive overview of recent developments and trends across different modalities. In this survey, we provide a holistic review aimed at researchers beginning their exploration in emotion recognition and generation. We introduce the fundamental principles underlying emotion recognition and generation across facial, vocal, and textual modalities. This work categorises recent state-of-the-art research into distinct technical approaches and explains the theoretical foundations and motivations behind these methodologies, offering a clearer understanding of their application. Moreover, we discuss evaluation metrics, comparative analyses, and current limitations, shedding light on the challenges faced by researchers in the field. Finally, we propose future research directions to address these challenges and encourage further exploration into developing robust, effective, and ethically responsible emotion recognition and generation systems.

Emotion Recognition and Generation: A Comprehensive Review of Face, Speech, and Text Modalities

TL;DR

Abstract

Emotion Recognition and Generation: A Comprehensive Review of Face, Speech, and Text Modalities

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)