Table of Contents
Fetching ...

7ABAW-Compound Expression Recognition via Curriculum Learning

Chen Liu, Feng Qiu, Wei Zhang, Lincheng Li, Dadong Wang, Xin Yu

TL;DR

This work tackles Compound Expression Recognition under limited labeled data by introducing a curriculum-learning framework that first trains on single-expression data and then progressively incorporates compound expressions through synthetic augmentation. A Masked Autoencoder (MAE) backbone is pretrained on large-scale face datasets and fine-tuned for affective recognition, while CutMix and Mixup generate diverse compound-expression samples to bridge data gaps. The training is staged, gradually increasing the proportion of compound data and aggregating probabilities of basic expressions to form compound predictions, all optimized with Binary Cross-Entropy Multi-Label Loss. The approach achieves state-of-the-art results on the ABAW7 CE track (F1 = 0.6063), demonstrating that structured, gradually complex learning combined with augmented data enhances generalization for complex affective states.

Abstract

With the advent of deep learning, expression recognition has made significant advancements. However, due to the limited availability of annotated compound expression datasets and the subtle variations of compound expressions, Compound Emotion Recognition (CE) still holds considerable potential for exploration. To advance this task, the 7th Affective Behavior Analysis in-the-wild (ABAW) competition introduces the Compound Expression Challenge based on C-EXPR-DB, a limited dataset without labels. In this paper, we present a curriculum learning-based framework that initially trains the model on single-expression tasks and subsequently incorporates multi-expression data. This design ensures that our model first masters the fundamental features of basic expressions before being exposed to the complexities of compound emotions. Specifically, our designs can be summarized as follows: 1) Single-Expression Pre-training: The model is first trained on datasets containing single expressions to learn the foundational facial features associated with basic emotions. 2) Dynamic Compound Expression Generation: Given the scarcity of annotated compound expression datasets, we employ CutMix and Mixup techniques on the original single-expression images to create hybrid images exhibiting characteristics of multiple basic emotions. 3) Incremental Multi-Expression Integration: After performing well on single-expression tasks, the model is progressively exposed to multi-expression data, allowing the model to adapt to the complexity and variability of compound expressions. The official results indicate that our method achieves the \textbf{best} performance in this competition track with an F-score of 0.6063. Our code is released at https://github.com/YenanLiu/ABAW7th.

7ABAW-Compound Expression Recognition via Curriculum Learning

TL;DR

This work tackles Compound Expression Recognition under limited labeled data by introducing a curriculum-learning framework that first trains on single-expression data and then progressively incorporates compound expressions through synthetic augmentation. A Masked Autoencoder (MAE) backbone is pretrained on large-scale face datasets and fine-tuned for affective recognition, while CutMix and Mixup generate diverse compound-expression samples to bridge data gaps. The training is staged, gradually increasing the proportion of compound data and aggregating probabilities of basic expressions to form compound predictions, all optimized with Binary Cross-Entropy Multi-Label Loss. The approach achieves state-of-the-art results on the ABAW7 CE track (F1 = 0.6063), demonstrating that structured, gradually complex learning combined with augmented data enhances generalization for complex affective states.

Abstract

With the advent of deep learning, expression recognition has made significant advancements. However, due to the limited availability of annotated compound expression datasets and the subtle variations of compound expressions, Compound Emotion Recognition (CE) still holds considerable potential for exploration. To advance this task, the 7th Affective Behavior Analysis in-the-wild (ABAW) competition introduces the Compound Expression Challenge based on C-EXPR-DB, a limited dataset without labels. In this paper, we present a curriculum learning-based framework that initially trains the model on single-expression tasks and subsequently incorporates multi-expression data. This design ensures that our model first masters the fundamental features of basic expressions before being exposed to the complexities of compound emotions. Specifically, our designs can be summarized as follows: 1) Single-Expression Pre-training: The model is first trained on datasets containing single expressions to learn the foundational facial features associated with basic emotions. 2) Dynamic Compound Expression Generation: Given the scarcity of annotated compound expression datasets, we employ CutMix and Mixup techniques on the original single-expression images to create hybrid images exhibiting characteristics of multiple basic emotions. 3) Incremental Multi-Expression Integration: After performing well on single-expression tasks, the model is progressively exposed to multi-expression data, allowing the model to adapt to the complexity and variability of compound expressions. The official results indicate that our method achieves the \textbf{best} performance in this competition track with an F-score of 0.6063. Our code is released at https://github.com/YenanLiu/ABAW7th.

Paper Structure

This paper contains 20 sections, 2 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Illustration of our proposed frameworks for the compound recognition competition. We adopt a curriculum learning approach, transitioning from basic expression prediction to compound expression learning. We take the second stage training process as an example to illustrate the transition process. Different from the first training stage only utilizes the basic expression data to train the model, in this second stage, we randomly select the compound expression data from the natural compound expression datasets (i.e. RAD-BF and Fuxi-EXPR ) and the generated compound expression data. Specifically, there are 80% basic expression images and 20% compound expression images involved in the second training stage. Here, cls. Head refers to the classification head, which is composed of linear layers.