Table of Contents
Fetching ...

Multi-Task Learning for Affect Analysis

Fazeel Asim

TL;DR

The paper addresses emotion recognition from images by comparing uni-task and multi-task learning approaches across three related affective tasks: basic emotion recognition, action unit detection, and valence-arousal estimation. It employs a shared CNN backbone (ResNet-18) with task-specific heads and systematically investigates initialization, pre-training, data augmentation, and loss functions to optimize multi-task formulations. Evaluations on Aff-Wild2 reveal mixed results across tasks: the combined multi-task model shows convergence with varying CCC and F1 scores (e.g., $\mathcal{P}_{VA} = \frac{\rho_a + \rho_v}{2}$ for VA, and per-task F1/CCC metrics), highlighting both the potential and limitations of joint modeling. The study provides practical guidance for designing affective computing systems with joint representations, with implications for healthcare, marketing, and human–computer interaction by enabling more integrated emotion understanding from images.

Abstract

This Project was my Undergraduate Final Year dissertation, supervised by Dimitrios Kollias This research delves into the realm of affective computing for image analysis, aiming to enhance the efficiency and effectiveness of multi-task learning in the context of emotion recognition. This project investigates two primary approaches: uni-task solutions and a multi-task approach to the same problems. Each approach undergoes testing, exploring various formulations, variations, and initialization strategies to come up with the best configuration. The project utilizes existing a neural network architecture, adapting it for multi-task learning by modifying output layers and loss functions. Tasks encompass 7 basic emotion recognition, action unit detection, and valence-arousal estimation. Comparative analyses involve uni-task models for each individual task, facilitating the assessment of multi-task model performance. Variations within each approach, including, loss functions, and hyperparameter tuning, undergo evaluation. The impact of different initialization strategies and pre-training techniques on model convergence and accuracy is explored. The research aspires to contribute to the burgeoning field of affective computing, with applications spanning healthcare, marketing, and human-computer interaction. By systematically exploring multi-task learning formulations, this research aims to contribute to the development of more accurate and efficient models for recognizing and understanding emotions in images. The findings hold promise for applications in diverse industries, paving the way for advancements in affective computing

Multi-Task Learning for Affect Analysis

TL;DR

The paper addresses emotion recognition from images by comparing uni-task and multi-task learning approaches across three related affective tasks: basic emotion recognition, action unit detection, and valence-arousal estimation. It employs a shared CNN backbone (ResNet-18) with task-specific heads and systematically investigates initialization, pre-training, data augmentation, and loss functions to optimize multi-task formulations. Evaluations on Aff-Wild2 reveal mixed results across tasks: the combined multi-task model shows convergence with varying CCC and F1 scores (e.g., for VA, and per-task F1/CCC metrics), highlighting both the potential and limitations of joint modeling. The study provides practical guidance for designing affective computing systems with joint representations, with implications for healthcare, marketing, and human–computer interaction by enabling more integrated emotion understanding from images.

Abstract

This Project was my Undergraduate Final Year dissertation, supervised by Dimitrios Kollias This research delves into the realm of affective computing for image analysis, aiming to enhance the efficiency and effectiveness of multi-task learning in the context of emotion recognition. This project investigates two primary approaches: uni-task solutions and a multi-task approach to the same problems. Each approach undergoes testing, exploring various formulations, variations, and initialization strategies to come up with the best configuration. The project utilizes existing a neural network architecture, adapting it for multi-task learning by modifying output layers and loss functions. Tasks encompass 7 basic emotion recognition, action unit detection, and valence-arousal estimation. Comparative analyses involve uni-task models for each individual task, facilitating the assessment of multi-task model performance. Variations within each approach, including, loss functions, and hyperparameter tuning, undergo evaluation. The impact of different initialization strategies and pre-training techniques on model convergence and accuracy is explored. The research aspires to contribute to the burgeoning field of affective computing, with applications spanning healthcare, marketing, and human-computer interaction. By systematically exploring multi-task learning formulations, this research aims to contribute to the development of more accurate and efficient models for recognizing and understanding emotions in images. The findings hold promise for applications in diverse industries, paving the way for advancements in affective computing
Paper Structure (23 sections, 3 equations, 3 figures, 2 tables)

This paper contains 23 sections, 3 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Anger, Contempt, Fear, Disgust, Happiness, Sadness and Surprise.
  • Figure 2: Some facial Action Units
  • Figure 3: Valence and Arousal metrics in 2D space