DialogueLLM: Context and Emotion Knowledge-Tuned Large Language Models for Emotion Recognition in Conversations

Yazhou Zhang; Mengyao Wang; Youxi Wu; Prayag Tiwari; Qiuchi Li; Benyou Wang; Jing Qin

DialogueLLM: Context and Emotion Knowledge-Tuned Large Language Models for Emotion Recognition in Conversations

Yazhou Zhang, Mengyao Wang, Youxi Wu, Prayag Tiwari, Qiuchi Li, Benyou Wang, Jing Qin

TL;DR

DialogueLLM introduces an emotion- and context-knowledge tuned LLM by fine-tuning LLaMA 2-7B with a dataset of 2411 multi-modal dialogues (texts and video-derived descriptions) for emotion recognition in conversations. It treats ERC as a conditional generation task, integrating contextual history and multimodal cues via a prompt that includes $C_z$, $T_k$, and a $Text Description(V_k)$ to predict $Y_k$. Empirical results on MELD, IEMOCAP, and EmoryNLP show state-of-the-art performance with notable improvements over 15 baselines and existing SOTA LLMs, while ablation and prompting studies validate the importance of context, multimodal knowledge, and LoRA fine-tuning. The work demonstrates the viability of open-source, task-specific LLMs for nuanced affective understanding, with practical implications for ERC systems and multimodal NLP research, and outlines future work on richer video descriptions and broader affect modeling.

Abstract

Large language models (LLMs) and their variants have shown extraordinary efficacy across numerous downstream natural language processing (NLP) tasks, which has presented a new vision for the development of NLP. Despite their remarkable performance in natural language generating (NLG), LLMs lack a distinct focus on the emotion understanding domain. As a result, using LLMs for emotion recognition may lead to suboptimal and inadequate precision. Another limitation of LLMs is that they are typical trained without leveraging multi-modal information. To overcome these limitations, we propose DialogueLLM, a context and emotion knowledge tuned LLM that is obtained by fine-tuning LLaMA models with 13,638 multi-modal (i.e., texts and videos) emotional dialogues. The visual information is considered as the supplementary knowledge to construct high-quality instructions. We offer a comprehensive evaluation of our proposed model on three benchmarking emotion recognition in conversations (ERC) datasets and compare the results against the SOTA baselines and other SOTA LLMs. Additionally, DialogueLLM-7B can be easily trained using LoRA on a 40GB A100 GPU in 5 hours, facilitating reproducibility for other researchers.

DialogueLLM: Context and Emotion Knowledge-Tuned Large Language Models for Emotion Recognition in Conversations

TL;DR

, and a

to predict

. Empirical results on MELD, IEMOCAP, and EmoryNLP show state-of-the-art performance with notable improvements over 15 baselines and existing SOTA LLMs, while ablation and prompting studies validate the importance of context, multimodal knowledge, and LoRA fine-tuning. The work demonstrates the viability of open-source, task-specific LLMs for nuanced affective understanding, with practical implications for ERC systems and multimodal NLP research, and outlines future work on richer video descriptions and broader affect modeling.

Abstract

Paper Structure (23 sections, 6 equations, 12 figures, 6 tables, 1 algorithm)

This paper contains 23 sections, 6 equations, 12 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Large Language Models
Emotion Recognition in Conversations
Methodology
Problem Formulation
Base Model
Emotion and Context Knowledge Based Instruction Dataset
Training and Implementation
Experiments
Research Question
Experimental Settings
Compared Baselines
Results and Anlysis
Ablation Test
...and 8 more sections

Figures (12)

Figure 1: Sample utterances in a multi-modal conversation from the MELD dataset.
Figure 2: Overview of DialogueLLM fine-tuning and classification pipeline.
Figure 3: The distribution of three ERC datasets.
Figure 4: The distribution of seven basic emotions across three datasets.
Figure 5: The training loss of DialogueLLM.
...and 7 more figures

DialogueLLM: Context and Emotion Knowledge-Tuned Large Language Models for Emotion Recognition in Conversations

TL;DR

Abstract

DialogueLLM: Context and Emotion Knowledge-Tuned Large Language Models for Emotion Recognition in Conversations

Authors

TL;DR

Abstract

Table of Contents

Figures (12)