Table of Contents
Fetching ...

Affect Recognition in Conversations Using Large Language Models

Shutong Feng, Guangzhi Sun, Nurul Lubis, Wen Wu, Chao Zhang, Milica Gašić

TL;DR

This study investigates whether large language models (LLMs) can recognize affect in conversations across open-domain and task-oriented dialogues, using IEMOCAP, EmoWOZ, and DAIC-WOZ datasets. It systematically compares zero-shot, few-shot in-context learning, and task-specific fine-tuning (via LoRA) for several LLMs, while evaluating the impact of automatic speech recognition errors. The results show that zero-shot LLMs generally lag behind supervised methods, though larger models like GPT-4 can approach SOTA in some settings, and instruction-following fine-tuning significantly narrows the gap. Emotion recognition tends to be robust to ASR noise, whereas depression detection is more vulnerable to transcription errors; larger context and more ICL samples benefit emotion recognition more than depression. Overall, task-specific fine-tuning can yield performance close to SOTA with partial data, highlighting the potential of LLMs as affect-aware components in dialogue systems while underscoring challenges in long-context processing and real-time deployment.

Abstract

Affect recognition, encompassing emotions, moods, and feelings, plays a pivotal role in human communication. In the realm of conversational artificial intelligence, the ability to discern and respond to human affective cues is a critical factor for creating engaging and empathetic interactions. This study investigates the capacity of large language models (LLMs) to recognise human affect in conversations, with a focus on both open-domain chit-chat dialogues and task-oriented dialogues. Leveraging three diverse datasets, namely IEMOCAP (Busso et al., 2008), EmoWOZ (Feng et al., 2022), and DAIC-WOZ (Gratch et al., 2014), covering a spectrum of dialogues from casual conversations to clinical interviews, we evaluate and compare LLMs' performance in affect recognition. Our investigation explores the zero-shot and few-shot capabilities of LLMs through in-context learning as well as their model capacities through task-specific fine-tuning. Additionally, this study takes into account the potential impact of automatic speech recognition errors on LLM predictions. With this work, we aim to shed light on the extent to which LLMs can replicate human-like affect recognition capabilities in conversations.

Affect Recognition in Conversations Using Large Language Models

TL;DR

This study investigates whether large language models (LLMs) can recognize affect in conversations across open-domain and task-oriented dialogues, using IEMOCAP, EmoWOZ, and DAIC-WOZ datasets. It systematically compares zero-shot, few-shot in-context learning, and task-specific fine-tuning (via LoRA) for several LLMs, while evaluating the impact of automatic speech recognition errors. The results show that zero-shot LLMs generally lag behind supervised methods, though larger models like GPT-4 can approach SOTA in some settings, and instruction-following fine-tuning significantly narrows the gap. Emotion recognition tends to be robust to ASR noise, whereas depression detection is more vulnerable to transcription errors; larger context and more ICL samples benefit emotion recognition more than depression. Overall, task-specific fine-tuning can yield performance close to SOTA with partial data, highlighting the potential of LLMs as affect-aware components in dialogue systems while underscoring challenges in long-context processing and real-time deployment.

Abstract

Affect recognition, encompassing emotions, moods, and feelings, plays a pivotal role in human communication. In the realm of conversational artificial intelligence, the ability to discern and respond to human affective cues is a critical factor for creating engaging and empathetic interactions. This study investigates the capacity of large language models (LLMs) to recognise human affect in conversations, with a focus on both open-domain chit-chat dialogues and task-oriented dialogues. Leveraging three diverse datasets, namely IEMOCAP (Busso et al., 2008), EmoWOZ (Feng et al., 2022), and DAIC-WOZ (Gratch et al., 2014), covering a spectrum of dialogues from casual conversations to clinical interviews, we evaluate and compare LLMs' performance in affect recognition. Our investigation explores the zero-shot and few-shot capabilities of LLMs through in-context learning as well as their model capacities through task-specific fine-tuning. Additionally, this study takes into account the potential impact of automatic speech recognition errors on LLM predictions. With this work, we aim to shed light on the extent to which LLMs can replicate human-like affect recognition capabilities in conversations.
Paper Structure (36 sections, 2 equations, 2 figures, 8 tables)

This paper contains 36 sections, 2 equations, 2 figures, 8 tables.

Figures (2)

  • Figure 1: A flowchart illustrating the affect recognition pipeline using Whisper and LLM. The designed prompt comprises parts introduced in Table \ref{['tab:prompt-design']}. Low-rank adaptation (LoRA) is used for fine-tuning open-source LLMs.
  • Figure 2: Change of model performance when fine-tuning with different proportions of the training data.