Table of Contents
Fetching ...

A Survey of Deep Learning for Group-level Emotion Recognition

Xiaohua Huang, Jinke Xu, Wenming Zheng, Qirong Mao, Abhinav Dhall

TL;DR

The paper surveys deep learning approaches for group-level emotion recognition (GER), detailing a taxonomy of DL methods, available image- and video-based datasets, input modalities, network architectures, fusion strategies, and loss functions. It highlights how single- and multi-stream architectures, cascade models, graph-based networks, and attention mechanisms have driven progress, with multimodal fusion (face, scene, skeleton, audio) yielding the best results in EmotiW benchmarks. The review also analyzes evaluation protocols and performance trends, and discusses practical challenges such as data scarcity, annotation bias, and the need for robust cross-modal fusion. Collectively, the work provides a comprehensive roadmap for designing robust GER systems and guides future research toward larger, more diverse datasets, unsupervised/self-supervised learning, continual adaptation, and principled multimodal evaluation.

Abstract

With the advancement of artificial intelligence (AI) technology, group-level emotion recognition (GER) has emerged as an important area in analyzing human behavior. Early GER methods are primarily relied on handcrafted features. However, with the proliferation of Deep Learning (DL) techniques and their remarkable success in diverse tasks, neural networks have garnered increasing interest in GER. Unlike individual's emotion, group emotions exhibit diversity and dynamics. Presently, several DL approaches have been proposed to effectively leverage the rich information inherent in group-level image and enhance GER performance significantly. In this survey, we present a comprehensive review of DL techniques applied to GER, proposing a new taxonomy for the field cover all aspects of GER based on DL. The survey overviews datasets, the deep GER pipeline, and performance comparisons of the state-of-the-art methods past decade. Moreover, it summarizes and discuss the fundamental approaches and advanced developments for each aspect. Furthermore, we identify outstanding challenges and suggest potential avenues for the design of robust GER systems. To the best of our knowledge, thus survey represents the first comprehensive review of deep GER methods, serving as a pivotal references for future GER research endeavors.

A Survey of Deep Learning for Group-level Emotion Recognition

TL;DR

The paper surveys deep learning approaches for group-level emotion recognition (GER), detailing a taxonomy of DL methods, available image- and video-based datasets, input modalities, network architectures, fusion strategies, and loss functions. It highlights how single- and multi-stream architectures, cascade models, graph-based networks, and attention mechanisms have driven progress, with multimodal fusion (face, scene, skeleton, audio) yielding the best results in EmotiW benchmarks. The review also analyzes evaluation protocols and performance trends, and discusses practical challenges such as data scarcity, annotation bias, and the need for robust cross-modal fusion. Collectively, the work provides a comprehensive roadmap for designing robust GER systems and guides future research toward larger, more diverse datasets, unsupervised/self-supervised learning, continual adaptation, and principled multimodal evaluation.

Abstract

With the advancement of artificial intelligence (AI) technology, group-level emotion recognition (GER) has emerged as an important area in analyzing human behavior. Early GER methods are primarily relied on handcrafted features. However, with the proliferation of Deep Learning (DL) techniques and their remarkable success in diverse tasks, neural networks have garnered increasing interest in GER. Unlike individual's emotion, group emotions exhibit diversity and dynamics. Presently, several DL approaches have been proposed to effectively leverage the rich information inherent in group-level image and enhance GER performance significantly. In this survey, we present a comprehensive review of DL techniques applied to GER, proposing a new taxonomy for the field cover all aspects of GER based on DL. The survey overviews datasets, the deep GER pipeline, and performance comparisons of the state-of-the-art methods past decade. Moreover, it summarizes and discuss the fundamental approaches and advanced developments for each aspect. Furthermore, we identify outstanding challenges and suggest potential avenues for the design of robust GER systems. To the best of our knowledge, thus survey represents the first comprehensive review of deep GER methods, serving as a pivotal references for future GER research endeavors.
Paper Structure (29 sections, 2 figures, 4 tables)

This paper contains 29 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: The overview of deep learning based technical papers and survey papers for group-level emotion recognition. Viewed in color is BEST.
  • Figure 2: Basic blocks for GER: (1) Convolution block; (b) Recurrent neural network (RNN); (c) Cascade network; (d) Graph Convolutional Network (GCN).