Table of Contents
Fetching ...

Affective Behaviour Analysis via Progressive Learning

Chen Liu, Wei Zhang, Feng Qiu, Lincheng Li, Xin Yu

TL;DR

This work targets affective behavior analysis across three interrelated tasks: Valence-Arousal (VA) estimation, Expression recognition (EXPR), and Action Unit (AU) detection, evaluated in the ABAW7 Multi-task Learning challenge. It introduces a progressive multi-task learning framework that first trains task-specific models, then performs joint training enhanced by a Feature Fusion Module and a Temporal Convergence Module, guided by a strategy search over task configurations. Losses are defined as BCE for AU, CE for EXPR, and a CCC-based objective for VA, combined into an overall objective $ \mathcal{L}_{Overall}$ with task weights to optimize all tasks concurrently. Experiments on Aff-Wild2-based data (including extra datasets for pretraining) show state-of-the-art performance, with a total score $P$ of $1.5286$ and task-level gains such as AU F-score $0.5580$, EXPR F-score $0.4286$, and VA CCC $0.5420$, validating the effectiveness of progressive learning, feature fusion, and temporal modeling in real-world affective analysis.

Abstract

Affective Behavior Analysis aims to develop emotionally intelligent technology that can recognize and respond to human emotions. To advance this field, the 7th Affective Behavior Analysis in-the-wild (ABAW) competition holds the Multi-Task Learning Challenge based on the s-Aff-Wild2 database. The participants are required to develop a framework that achieves Valence-Arousal Estimation, Expression Recognition, and AU detection simultaneously. To achieve this goal, we propose a progressive multi-task learning framework that fully leverages the distinct focuses of each task on facial emotion features. Specifically, our method design can be summarized into three main aspects: 1) Separate Training and Joint Training: We first train each task model separately and then perform joint training based on the pre-trained models, fully utilizing the feature focus aspects of each task to improve the overall framework performance. 2) Feature Fusion and Temporal Modeling:} We investigate effective strategies for fusing features extracted from each task-specific model and incorporate temporal feature modeling during the joint training phase, which further refines the performance of each task. 3) Joint Training Strategy Optimization: To identify the optimal joint training approach, we conduct a comprehensive strategy search, experimenting with various task combinations and training methodologies to further elevate the overall performance of each task. According to the official results, our team achieves first place in the MTL challenge with a total score of 1.5286 (i.e., AU F-score 0.5580, Expression F-score 0.4286, CCC VA score 0.5420). Our code is publicly available at https://github.com/YenanLiu/ABAW7th.

Affective Behaviour Analysis via Progressive Learning

TL;DR

This work targets affective behavior analysis across three interrelated tasks: Valence-Arousal (VA) estimation, Expression recognition (EXPR), and Action Unit (AU) detection, evaluated in the ABAW7 Multi-task Learning challenge. It introduces a progressive multi-task learning framework that first trains task-specific models, then performs joint training enhanced by a Feature Fusion Module and a Temporal Convergence Module, guided by a strategy search over task configurations. Losses are defined as BCE for AU, CE for EXPR, and a CCC-based objective for VA, combined into an overall objective with task weights to optimize all tasks concurrently. Experiments on Aff-Wild2-based data (including extra datasets for pretraining) show state-of-the-art performance, with a total score of and task-level gains such as AU F-score , EXPR F-score , and VA CCC , validating the effectiveness of progressive learning, feature fusion, and temporal modeling in real-world affective analysis.

Abstract

Affective Behavior Analysis aims to develop emotionally intelligent technology that can recognize and respond to human emotions. To advance this field, the 7th Affective Behavior Analysis in-the-wild (ABAW) competition holds the Multi-Task Learning Challenge based on the s-Aff-Wild2 database. The participants are required to develop a framework that achieves Valence-Arousal Estimation, Expression Recognition, and AU detection simultaneously. To achieve this goal, we propose a progressive multi-task learning framework that fully leverages the distinct focuses of each task on facial emotion features. Specifically, our method design can be summarized into three main aspects: 1) Separate Training and Joint Training: We first train each task model separately and then perform joint training based on the pre-trained models, fully utilizing the feature focus aspects of each task to improve the overall framework performance. 2) Feature Fusion and Temporal Modeling:} We investigate effective strategies for fusing features extracted from each task-specific model and incorporate temporal feature modeling during the joint training phase, which further refines the performance of each task. 3) Joint Training Strategy Optimization: To identify the optimal joint training approach, we conduct a comprehensive strategy search, experimenting with various task combinations and training methodologies to further elevate the overall performance of each task. According to the official results, our team achieves first place in the MTL challenge with a total score of 1.5286 (i.e., AU F-score 0.5580, Expression F-score 0.4286, CCC VA score 0.5420). Our code is publicly available at https://github.com/YenanLiu/ABAW7th.
Paper Structure (21 sections, 7 equations, 1 figure, 5 tables)

This paper contains 21 sections, 7 equations, 1 figure, 5 tables.

Figures (1)

  • Figure 1: Illustration of our proposed framework for the MTL competition. In the initial stage, we train models for each task separately. Once optimal performance is achieved, we begin joint training to enhance performance further. Taking EXPR model training as an example, features like $F_{AU}$, $F_{V}$, and $F_{A}$ from pre-trained encoders are fused with features from the current EXPR Encoder via the Feature Fusion Model. The fused features are then processed by the Temporal Convergence Module to capture temporal information. Finally, the features are sent to task-specific heads (AU, EXPR, and VA) for predictions. The final integration scheme for each subtask depends on the validation set performance, and the detailed analysis is provided in Sec. \ref{['sec:exper']}