EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition

Christoph Schuhmann; Robert Kaczmarczyk; Gollam Rabby; Felix Friedrich; Maurice Kraus; Krishna Kalyan; Kourosh Nadi; Huu Nguyen; Kristian Kersting; Sören Auer

EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition

Christoph Schuhmann, Robert Kaczmarczyk, Gollam Rabby, Felix Friedrich, Maurice Kraus, Krishna Kalyan, Kourosh Nadi, Huu Nguyen, Kristian Kersting, Sören Auer

TL;DR

EmoNet-Face tackles the limited emotional repertoire and demographic biases of existing benchmarks by introducing a fine-grained 40-category emotion taxonomy grounded in established psychology. It delivers three synthetic, expert-annotated datasets—Big for pretraining, Binary for fine-tuning, and HQ for evaluation—featuring controlled demographic balance across ethnicity, age, and gender. The authors train EmpathicInsight-Face, a specialized model, achieving near-human performance on the HQ benchmark and demonstrate significant gaps in general-purpose VLMs for nuanced facial emotion recognition. By openly releasing the taxonomy, datasets, and models, the work provides a robust foundation for advancing emotion-aware AI while emphasizing ethical considerations and the need for multimodal integration in future research.

Abstract

Effective human-AI interaction relies on AI's ability to accurately perceive and interpret human emotions. Current benchmarks for vision and vision-language models are severely limited, offering a narrow emotional spectrum that overlooks nuanced states (e.g., bitterness, intoxication) and fails to distinguish subtle differences between related feelings (e.g., shame vs. embarrassment). Existing datasets also often use uncontrolled imagery with occluded faces and lack demographic diversity, risking significant bias. To address these critical gaps, we introduce EmoNet Face, a comprehensive benchmark suite. EmoNet Face features: (1) A novel 40-category emotion taxonomy, meticulously derived from foundational research to capture finer details of human emotional experiences. (2) Three large-scale, AI-generated datasets (EmoNet HQ, Binary, and Big) with explicit, full-face expressions and controlled demographic balance across ethnicity, age, and gender. (3) Rigorous, multi-expert annotations for training and high-fidelity evaluation. (4) We built EmpathicInsight-Face, a model achieving human-expert-level performance on our benchmark. The publicly released EmoNet Face suite - taxonomy, datasets, and model - provides a robust foundation for developing and evaluating AI systems with a deeper understanding of human emotions.

EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition

TL;DR

Abstract

EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)