MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering

Xinqi Fan; Jingting Li; John See; Moi Hoon Yap; Wen-Huang Cheng; Xiaobai Li; Xiaopeng Hong; Su-Jing Wang; Adrian K. Davision

MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering

Xinqi Fan, Jingting Li, John See, Moi Hoon Yap, Wen-Huang Cheng, Xiaobai Li, Xiaopeng Hong, Su-Jing Wang, Adrian K. Davision

TL;DR

MEGC2025 presents two concurrent ME-analysis tasks to push end-to-end understanding: ME-STR unifies micro-expression spotting and recognition in a sequential pipeline, while ME-VQA leverages multimodal language models to answer questions about ME content. Using MEAN as a baseline for STR and Qwen2.5VL-3B for VQA, the challenge reveals that temporal localisation remains the bottleneck for STR (best STRS around 0.09) whereas LVLM-based VQA shows stronger performance (average around 0.575) given appropriate temporal inputs like onset-apex-offset or optical flow. The extensive challenge evaluations on SAMM and CAS(ME)^3-derived datasets demonstrate both the feasibility and current limits of integrated ME analysis, and they underscore the value of larger, more diverse ME datasets to improve generalisation. Overall, MEGC2025 advances ME research by combining end-to-end spotting-recognition and language-driven reasoning, promoting reproducibility through public leaderboards and cross-domain collaboration between computer vision and psychology.

Abstract

Facial micro-expressions (MEs) are involuntary movements of the face that occur spontaneously when a person experiences an emotion but attempts to suppress or repress the facial expression, typically found in a high-stakes environment. In recent years, substantial advancements have been made in the areas of ME recognition, spotting, and generation. However, conventional approaches that treat spotting and recognition as separate tasks are suboptimal, particularly for analyzing long-duration videos in realistic settings. Concurrently, the emergence of multimodal large language models (MLLMs) and large vision-language models (LVLMs) offers promising new avenues for enhancing ME analysis through their powerful multimodal reasoning capabilities. The ME grand challenge (MEGC) 2025 introduces two tasks that reflect these evolving research directions: (1) ME spot-then-recognize (ME-STR), which integrates ME spotting and subsequent recognition in a unified sequential pipeline; and (2) ME visual question answering (ME-VQA), which explores ME understanding through visual question answering, leveraging MLLMs or LVLMs to address diverse question types related to MEs. All participating algorithms are required to run on this test set and submit their results on a leaderboard. More details are available at https://megc2025.github.io.

MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering

TL;DR

Abstract

MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)