7ABAW-Compound Expression Recognition via Curriculum Learning
Chen Liu, Feng Qiu, Wei Zhang, Lincheng Li, Dadong Wang, Xin Yu
TL;DR
This work tackles Compound Expression Recognition under limited labeled data by introducing a curriculum-learning framework that first trains on single-expression data and then progressively incorporates compound expressions through synthetic augmentation. A Masked Autoencoder (MAE) backbone is pretrained on large-scale face datasets and fine-tuned for affective recognition, while CutMix and Mixup generate diverse compound-expression samples to bridge data gaps. The training is staged, gradually increasing the proportion of compound data and aggregating probabilities of basic expressions to form compound predictions, all optimized with Binary Cross-Entropy Multi-Label Loss. The approach achieves state-of-the-art results on the ABAW7 CE track (F1 = 0.6063), demonstrating that structured, gradually complex learning combined with augmented data enhances generalization for complex affective states.
Abstract
With the advent of deep learning, expression recognition has made significant advancements. However, due to the limited availability of annotated compound expression datasets and the subtle variations of compound expressions, Compound Emotion Recognition (CE) still holds considerable potential for exploration. To advance this task, the 7th Affective Behavior Analysis in-the-wild (ABAW) competition introduces the Compound Expression Challenge based on C-EXPR-DB, a limited dataset without labels. In this paper, we present a curriculum learning-based framework that initially trains the model on single-expression tasks and subsequently incorporates multi-expression data. This design ensures that our model first masters the fundamental features of basic expressions before being exposed to the complexities of compound emotions. Specifically, our designs can be summarized as follows: 1) Single-Expression Pre-training: The model is first trained on datasets containing single expressions to learn the foundational facial features associated with basic emotions. 2) Dynamic Compound Expression Generation: Given the scarcity of annotated compound expression datasets, we employ CutMix and Mixup techniques on the original single-expression images to create hybrid images exhibiting characteristics of multiple basic emotions. 3) Incremental Multi-Expression Integration: After performing well on single-expression tasks, the model is progressively exposed to multi-expression data, allowing the model to adapt to the complexity and variability of compound expressions. The official results indicate that our method achieves the \textbf{best} performance in this competition track with an F-score of 0.6063. Our code is released at https://github.com/YenanLiu/ABAW7th.
