A Survey on Facial Expression Recognition of Static and Dynamic Emotions
Yan Wang, Shaoqi Yan, Yang Liu, Wei Song, Jing Liu, Yang Chang, Xinji Mai, Xiping Hu, Wenqiang Zhang, Zhongxue Gan
TL;DR
This survey comprehensively maps the landscape of facial expression recognition by separately analyzing static FER (SFER) and dynamic FER (DFER), detailing datasets, workflows, and eight SFER vs seven DFER challenges. It surveys a wide spectrum of model families (CNNs, GCNs, Transformers) and strategies (disturbance-invariance, uncertainty handling, cross-domain/adaptation, weak supervision, and cross-modal fusion) across both image and video modalities, including 3D FER and multimodal approaches with visual-language alignment. The paper also analyzes recent advances on in-the-lab and in-the-wild benchmarks, discusses applications in health, education, and HCI, and highlights ethical concerns, bias, and privacy issues. Finally, it outlines development trends such as zero-shot FER, embodied FER, and multimodal large-language-model–assisted approaches, offering future directions and a public project page for resources and code.
Abstract
Facial expression recognition (FER) aims to analyze emotional states from static images and dynamic sequences, which is pivotal in enhancing anthropomorphic communication among humans, robots, and digital avatars by leveraging AI technologies. As the FER field evolves from controlled laboratory environments to more complex in-the-wild scenarios, advanced methods have been rapidly developed and new challenges and apporaches are encounted, which are not well addressed in existing reviews of FER. This paper offers a comprehensive survey of both image-based static FER (SFER) and video-based dynamic FER (DFER) methods, analyzing from model-oriented development to challenge-focused categorization. We begin with a critical comparison of recent reviews, an introduction to common datasets and evaluation criteria, and an in-depth workflow on FER to establish a robust research foundation. We then systematically review representative approaches addressing eight main challenges in SFER (such as expression disturbance, uncertainties, compound emotions, and cross-domain inconsistency) as well as seven main challenges in DFER (such as key frame sampling, expression intensity variations, and cross-modal alignment). Additionally, we analyze recent advancements, benchmark performances, major applications, and ethical considerations. Finally, we propose five promising future directions and development trends to guide ongoing research. The project page for this paper can be found at https://github.com/wangyanckxx/SurveyFER.
