Table of Contents
Fetching ...

REACT 2025: the Third Multiple Appropriate Facial Reaction Generation Challenge

Siyang Song, Micol Spitale, Xiangyu Kong, Hengde Zhu, Cheng Luo, Cristina Palmero, German Barquero, Sergio Escalera, Michel Valstar, Mohamed Daoudi, Tobias Baur, Fabien Ringeval, Andrew Howes, Elisabeth Andre, Hatice Gunes

TL;DR

REACT 2025 addresses the one-to-many challenge of generating multiple appropriate listener facial reactions to speaker behaviours by introducing the MARS dataset and two sub-tasks: offline and online MAFRG. It proposes transformer-based variational and diffusion-based baselines (Trans-VAE, PerFRDiff) and a reversible GNN baseline (REGNN) to model diverse, realistic, and synchronized AFRs, evaluated on 25 facial attributes and 2D video using metrics for appropriateness, diversity, realism, and synchrony. The dataset provides rich multi-modal data (audio, video, EEG) with ground-truth AFRs and personality traits collected in a controlled multi-topic setting, enabling variable-length input handling and robust benchmarking. Overall, REACT 2025 establishes a standardized, open framework that advances generative modeling for social-communication AI and paves the way for more natural humanoid agents in dyadic interactions.

Abstract

In dyadic interactions, a broad spectrum of human facial reactions might be appropriate for responding to each human speaker behaviour. Following the successful organisation of the REACT 2023 and REACT 2024 challenges, we are proposing the REACT 2025 challenge encouraging the development and benchmarking of Machine Learning (ML) models that can be used to generate multiple appropriate, diverse, realistic and synchronised human-style facial reactions expressed by human listeners in response to an input stimulus (i.e., audio-visual behaviours expressed by their corresponding speakers). As a key of the challenge, we provide challenge participants with the first natural and large-scale multi-modal MAFRG dataset (called MARS) recording 137 human-human dyadic interactions containing a total of 2856 interaction sessions covering five different topics. In addition, this paper also presents the challenge guidelines and the performance of our baselines on the two proposed sub-challenges: Offline MAFRG and Online MAFRG, respectively. The challenge baseline code is publicly available at https://github.com/reactmultimodalchallenge/baseline_react2025

REACT 2025: the Third Multiple Appropriate Facial Reaction Generation Challenge

TL;DR

REACT 2025 addresses the one-to-many challenge of generating multiple appropriate listener facial reactions to speaker behaviours by introducing the MARS dataset and two sub-tasks: offline and online MAFRG. It proposes transformer-based variational and diffusion-based baselines (Trans-VAE, PerFRDiff) and a reversible GNN baseline (REGNN) to model diverse, realistic, and synchronized AFRs, evaluated on 25 facial attributes and 2D video using metrics for appropriateness, diversity, realism, and synchrony. The dataset provides rich multi-modal data (audio, video, EEG) with ground-truth AFRs and personality traits collected in a controlled multi-topic setting, enabling variable-length input handling and robust benchmarking. Overall, REACT 2025 establishes a standardized, open framework that advances generative modeling for social-communication AI and paves the way for more natural humanoid agents in dyadic interactions.

Abstract

In dyadic interactions, a broad spectrum of human facial reactions might be appropriate for responding to each human speaker behaviour. Following the successful organisation of the REACT 2023 and REACT 2024 challenges, we are proposing the REACT 2025 challenge encouraging the development and benchmarking of Machine Learning (ML) models that can be used to generate multiple appropriate, diverse, realistic and synchronised human-style facial reactions expressed by human listeners in response to an input stimulus (i.e., audio-visual behaviours expressed by their corresponding speakers). As a key of the challenge, we provide challenge participants with the first natural and large-scale multi-modal MAFRG dataset (called MARS) recording 137 human-human dyadic interactions containing a total of 2856 interaction sessions covering five different topics. In addition, this paper also presents the challenge guidelines and the performance of our baselines on the two proposed sub-challenges: Offline MAFRG and Online MAFRG, respectively. The challenge baseline code is publicly available at https://github.com/reactmultimodalchallenge/baseline_react2025

Paper Structure

This paper contains 7 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Illustration of the data collection scenario of the MARS dataset. The left side outlines the preparatory steps, including protocol design, ethical approval, scheduling, obtaining participant consent, and completing a personality questionnaire. The right side illustrates the physical data collection setup, where a pair of human speaker and listener sit in front of PCs to conduct a video conference in the context of several pre-defined interaction tasks.
  • Figure 2: Statistics of participants' ethnic groups.
  • Figure 3: Overview of the Trans-VAE baseline.
  • Figure 4: Overview of the PerFRDiff baseline.
  • Figure 5: Overview of the REGNN baseline.