Table of Contents
Fetching ...

EmoLLMs: A Series of Emotional Large Language Models and Annotation Tools for Comprehensive Affective Analysis

Zhiwei Liu, Kailai Yang, Tianlin Zhang, Qianqian Xie, Sophia Ananiadou

TL;DR

EmoLLMs introduce a comprehensive framework for affective analysis by combining an open, multi-task instruction dataset (AAID) with a corresponding evaluation benchmark (AEB) and a series of fine-tuned open-source LLMs. Treating affective analysis as a generative multitask problem, the authors build 234K AAID prompts from SemEval-2018 Task 1 and evaluate across 14 diverse datasets (8 regression, 6 classification) to test generalization. Empirically, EmoLLMs outperform other open-source LLMs and sentiment tools, often surpassing close to ChatGPT and GPT-4 on AEB tasks and demonstrating robust cross-domain transfer. The work demonstrates practical potential for high-quality affective annotation and downstream deployment, while acknowledging limitations such as language scope and modality, with plans for multilingual and multimodal extensions.

Abstract

Sentiment analysis and emotion detection are important research topics in natural language processing (NLP) and benefit many downstream tasks. With the widespread application of LLMs, researchers have started exploring the application of LLMs based on instruction-tuning in the field of sentiment analysis. However, these models only focus on single aspects of affective classification tasks (e.g. sentimental polarity or categorical emotions), and overlook the regression tasks (e.g. sentiment strength or emotion intensity), which leads to poor performance in downstream tasks. The main reason is the lack of comprehensive affective instruction tuning datasets and evaluation benchmarks, which cover various affective classification and regression tasks. Moreover, although emotional information is useful for downstream tasks, existing downstream datasets lack high-quality and comprehensive affective annotations. In this paper, we propose EmoLLMs, the first series of open-sourced instruction-following LLMs for comprehensive affective analysis based on fine-tuning various LLMs with instruction data, the first multi-task affective analysis instruction dataset (AAID) with 234K data samples based on various classification and regression tasks to support LLM instruction tuning, and a comprehensive affective evaluation benchmark (AEB) with 14 tasks from various sources and domains to test the generalization ability of LLMs. We propose a series of EmoLLMs by fine-tuning LLMs with AAID to solve various affective instruction tasks. We compare our model with a variety of LLMs on AEB, where our models outperform all other open-sourced LLMs, and surpass ChatGPT and GPT-4 in most tasks, which shows that the series of EmoLLMs achieve the ChatGPT-level and GPT-4-level generalization capabilities on affective analysis tasks, and demonstrates our models can be used as affective annotation tools.

EmoLLMs: A Series of Emotional Large Language Models and Annotation Tools for Comprehensive Affective Analysis

TL;DR

EmoLLMs introduce a comprehensive framework for affective analysis by combining an open, multi-task instruction dataset (AAID) with a corresponding evaluation benchmark (AEB) and a series of fine-tuned open-source LLMs. Treating affective analysis as a generative multitask problem, the authors build 234K AAID prompts from SemEval-2018 Task 1 and evaluate across 14 diverse datasets (8 regression, 6 classification) to test generalization. Empirically, EmoLLMs outperform other open-source LLMs and sentiment tools, often surpassing close to ChatGPT and GPT-4 on AEB tasks and demonstrating robust cross-domain transfer. The work demonstrates practical potential for high-quality affective annotation and downstream deployment, while acknowledging limitations such as language scope and modality, with plans for multilingual and multimodal extensions.

Abstract

Sentiment analysis and emotion detection are important research topics in natural language processing (NLP) and benefit many downstream tasks. With the widespread application of LLMs, researchers have started exploring the application of LLMs based on instruction-tuning in the field of sentiment analysis. However, these models only focus on single aspects of affective classification tasks (e.g. sentimental polarity or categorical emotions), and overlook the regression tasks (e.g. sentiment strength or emotion intensity), which leads to poor performance in downstream tasks. The main reason is the lack of comprehensive affective instruction tuning datasets and evaluation benchmarks, which cover various affective classification and regression tasks. Moreover, although emotional information is useful for downstream tasks, existing downstream datasets lack high-quality and comprehensive affective annotations. In this paper, we propose EmoLLMs, the first series of open-sourced instruction-following LLMs for comprehensive affective analysis based on fine-tuning various LLMs with instruction data, the first multi-task affective analysis instruction dataset (AAID) with 234K data samples based on various classification and regression tasks to support LLM instruction tuning, and a comprehensive affective evaluation benchmark (AEB) with 14 tasks from various sources and domains to test the generalization ability of LLMs. We propose a series of EmoLLMs by fine-tuning LLMs with AAID to solve various affective instruction tasks. We compare our model with a variety of LLMs on AEB, where our models outperform all other open-sourced LLMs, and surpass ChatGPT and GPT-4 in most tasks, which shows that the series of EmoLLMs achieve the ChatGPT-level and GPT-4-level generalization capabilities on affective analysis tasks, and demonstrates our models can be used as affective annotation tools.
Paper Structure (20 sections, 3 figures, 6 tables)

This paper contains 20 sections, 3 figures, 6 tables.

Figures (3)

  • Figure 1: An overview of multi-task instruction tuning of EmoLLaMA for multiple affective analysis tasks.
  • Figure 2: Comparison between EmoLLMs and PLMs, Zero-shot/few- shot methods on AEB-1. The evaluation score for the first four tasks is the pcc (EI-reg and EI-oc adopt macro-average). E-c utilizes macro-F1 score.
  • Figure 3: Comparison between EmoLLMs and LLMs without fine-tuning on AEB-2. The evaluation score for the first six tasks (regression tasks) is the pcc. The last three tasks (classification tasks) utilize the macro-F1 score.