Affective Behaviour Analysis via Progressive Learning
Chen Liu, Wei Zhang, Feng Qiu, Lincheng Li, Xin Yu
TL;DR
This work targets affective behavior analysis across three interrelated tasks: Valence-Arousal (VA) estimation, Expression recognition (EXPR), and Action Unit (AU) detection, evaluated in the ABAW7 Multi-task Learning challenge. It introduces a progressive multi-task learning framework that first trains task-specific models, then performs joint training enhanced by a Feature Fusion Module and a Temporal Convergence Module, guided by a strategy search over task configurations. Losses are defined as BCE for AU, CE for EXPR, and a CCC-based objective for VA, combined into an overall objective $ \mathcal{L}_{Overall}$ with task weights to optimize all tasks concurrently. Experiments on Aff-Wild2-based data (including extra datasets for pretraining) show state-of-the-art performance, with a total score $P$ of $1.5286$ and task-level gains such as AU F-score $0.5580$, EXPR F-score $0.4286$, and VA CCC $0.5420$, validating the effectiveness of progressive learning, feature fusion, and temporal modeling in real-world affective analysis.
Abstract
Affective Behavior Analysis aims to develop emotionally intelligent technology that can recognize and respond to human emotions. To advance this field, the 7th Affective Behavior Analysis in-the-wild (ABAW) competition holds the Multi-Task Learning Challenge based on the s-Aff-Wild2 database. The participants are required to develop a framework that achieves Valence-Arousal Estimation, Expression Recognition, and AU detection simultaneously. To achieve this goal, we propose a progressive multi-task learning framework that fully leverages the distinct focuses of each task on facial emotion features. Specifically, our method design can be summarized into three main aspects: 1) Separate Training and Joint Training: We first train each task model separately and then perform joint training based on the pre-trained models, fully utilizing the feature focus aspects of each task to improve the overall framework performance. 2) Feature Fusion and Temporal Modeling:} We investigate effective strategies for fusing features extracted from each task-specific model and incorporate temporal feature modeling during the joint training phase, which further refines the performance of each task. 3) Joint Training Strategy Optimization: To identify the optimal joint training approach, we conduct a comprehensive strategy search, experimenting with various task combinations and training methodologies to further elevate the overall performance of each task. According to the official results, our team achieves first place in the MTL challenge with a total score of 1.5286 (i.e., AU F-score 0.5580, Expression F-score 0.4286, CCC VA score 0.5420). Our code is publicly available at https://github.com/YenanLiu/ABAW7th.
