Table of Contents
Fetching ...

LongEmotion: Measuring Emotional Intelligence of Large Language Models in Long-Context Interaction

Weichu Liu, Jing Xiong, Yuxuan Hu, Zixuan Li, Minghuan Tan, Ningning Mao, Hui Shen, Wendong Xu, Chaofan Tao, Min Yang, Chengming Li, Lingpeng Kong, Ngai Wong

TL;DR

This work introduces LongEmotion, a long-context EI benchmark with six tasks spanning emotion recognition, knowledge application, and empathetic generation, reaching an average context length of $15{,}341$ tokens. It combines a retrieval-augmented approach with Collaborative Emotional Modeling (CoEM), enabling multi-agent enrichment and emotional ensemble generation to improve EI in extended dialogues. Across extensive experiments, CoEM consistently enhances EI performance over standard RAG, with ablations highlighting the value of multi-agent reasoning and targeted emotional enrichment. The benchmark and framework provide a rigorous, psychology-informed platform for evaluating and advancing EI in long-context LLM interactions, with implications for safer, more coherent, and emotionally intelligent AI systems.

Abstract

Large language models (LLMs) have made significant progress in Emotional Intelligence (EI) and long-context modeling. However, existing benchmarks often overlook the fact that emotional information processing unfolds as a continuous long-context process. To address the absence of multidimensional EI evaluation in long-context inference and explore model performance under more challenging conditions, we present LongEmotion, a benchmark that encompasses a diverse suite of tasks targeting the assessment of models' capabilities in Emotion Recognition, Knowledge Application, and Empathetic Generation, with an average context length of 15,341 tokens. To enhance performance under realistic constraints, we introduce the Collaborative Emotional Modeling (CoEM) framework, which integrates Retrieval-Augmented Generation (RAG) and multi-agent collaboration to improve models' EI in long-context scenarios. We conduct a detailed analysis of various models in long-context settings, investigating how reasoning mode activation, RAG-based retrieval strategies, and context-length adaptability influence their EI performance. Our project page is: https://longemotion.github.io/

LongEmotion: Measuring Emotional Intelligence of Large Language Models in Long-Context Interaction

TL;DR

This work introduces LongEmotion, a long-context EI benchmark with six tasks spanning emotion recognition, knowledge application, and empathetic generation, reaching an average context length of tokens. It combines a retrieval-augmented approach with Collaborative Emotional Modeling (CoEM), enabling multi-agent enrichment and emotional ensemble generation to improve EI in extended dialogues. Across extensive experiments, CoEM consistently enhances EI performance over standard RAG, with ablations highlighting the value of multi-agent reasoning and targeted emotional enrichment. The benchmark and framework provide a rigorous, psychology-informed platform for evaluating and advancing EI in long-context LLM interactions, with implications for safer, more coherent, and emotionally intelligent AI systems.

Abstract

Large language models (LLMs) have made significant progress in Emotional Intelligence (EI) and long-context modeling. However, existing benchmarks often overlook the fact that emotional information processing unfolds as a continuous long-context process. To address the absence of multidimensional EI evaluation in long-context inference and explore model performance under more challenging conditions, we present LongEmotion, a benchmark that encompasses a diverse suite of tasks targeting the assessment of models' capabilities in Emotion Recognition, Knowledge Application, and Empathetic Generation, with an average context length of 15,341 tokens. To enhance performance under realistic constraints, we introduce the Collaborative Emotional Modeling (CoEM) framework, which integrates Retrieval-Augmented Generation (RAG) and multi-agent collaboration to improve models' EI in long-context scenarios. We conduct a detailed analysis of various models in long-context settings, investigating how reasoning mode activation, RAG-based retrieval strategies, and context-length adaptability influence their EI performance. Our project page is: https://longemotion.github.io/

Paper Structure

This paper contains 53 sections, 43 figures, 11 tables.

Figures (43)

  • Figure 1: (a) Sequence length denotes average model output length for Emotion Expression, and average input context length for other tasks. (b) Distribution of sample counts across the six tasks, illustrating the overall composition of the dataset.
  • Figure 2: An illustrative overview of the LongEmotion dataset. To comprehensively evaluate the EI of LLMs in long-context interaction, we design six tasks: Emotion Classification, Emotion Detection, Emotion QA, Emotion Conversation, Emotion Summary, and Emotion Expression.
  • Figure 3: Quality Evaluation on Emotion Conversation.
  • Figure 4: Annotation process of Emotion QA.
  • Figure 5: The pipeline of Collaborative Emotional Modeling (CoEM).
  • ...and 38 more figures