Affective Computing Has Changed: The Foundation Model Disruption

Björn Schuller; Adria Mallol-Ragolta; Alejandro Peña Almansa; Iosif Tsangko; Mostafa M. Amin; Anastasia Semertzidou; Lukas Christ; Shahin Amiriparian

Affective Computing Has Changed: The Foundation Model Disruption

Björn Schuller, Adria Mallol-Ragolta, Alejandro Peña Almansa, Iosif Tsangko, Mostafa M. Amin, Anastasia Semertzidou, Lukas Christ, Shahin Amiriparian

TL;DR

This work aims to raise awareness of the power of Foundation Models in the field of Affective Computing by synthetically generating and analysing multimodal affective data, focusing on vision, linguistics, and speech (acoustics).

Abstract

The dawn of Foundation Models has on the one hand revolutionised a wide range of research problems, and, on the other hand, democratised the access and use of AI-based tools by the general public. We even observe an incursion of these models into disciplines related to human psychology, such as the Affective Computing domain, suggesting their affective, emerging capabilities. In this work, we aim to raise awareness of the power of Foundation Models in the field of Affective Computing by synthetically generating and analysing multimodal affective data, focusing on vision, linguistics, and speech (acoustics). We also discuss some fundamental problems, such as ethical issues and regulatory aspects, related to the use of Foundation Models in this research area.

Affective Computing Has Changed: The Foundation Model Disruption

TL;DR

Abstract

Paper Structure (14 sections, 5 figures, 12 tables)

This paper contains 14 sections, 5 figures, 12 tables.

Introduction
Emergence in Foundation Models
The Vision Modality Has Changed
Generation
Analysis
The Linguistic Modality Has Changed
Generation
Analysis
The Speech Modality Has (Not Yet) Changed
Generation
Analysis
The Evaluation Is Changing
Concerns and Regulations Have Changed
Outlook and Conclusions

Figures (5)

Figure 1: Synthetic facial images of a white-skin, young woman conveying the 'Big Six' Ekman emotions ekman1971constants, in addition to the neutral state. All the images were generated with Stable Diffusion XL podell2023sdxl, conditioned on four different styles, namely photorealistic (first row), cartoon-painting (second row), anime (third row), and 3D (fourth row).
Figure 2: Confusion matrices obtained by analysing the facial images generated according to the four different styles with the ViT -- FER pre-trained model.
Figure 3: Pipeline of the affective text style transfer process for generating the affective sentences with 'surprise' as the prompted emotion. After that, we classify the synthesised sentences using RoBERTa, GPT-3.5, and GPT-4.
Figure 4: UAR scores obtained with the RoBERTa, GPT-3.5, and GPT-4 models when recognising the emotions conveyed by the synthetic sentences generated by LLaMA2 (left), Mistral (centre), and Mixtral (right).
Figure 5: Confusion matrices showing the performance (in %) of the fine-tuned RoBERTa baseline on the synthesised benchmarks, generated by LLaMA2, Mistral, and Mixtral, respectively, in addition to the GoEmotions test benchmark.

Affective Computing Has Changed: The Foundation Model Disruption

TL;DR

Abstract

Affective Computing Has Changed: The Foundation Model Disruption

Authors

TL;DR

Abstract

Table of Contents

Figures (5)