LongEmotion: Measuring Emotional Intelligence of Large Language Models in Long-Context Interaction
Weichu Liu, Jing Xiong, Yuxuan Hu, Zixuan Li, Minghuan Tan, Ningning Mao, Hui Shen, Wendong Xu, Chaofan Tao, Min Yang, Chengming Li, Lingpeng Kong, Ngai Wong
TL;DR
This work introduces LongEmotion, a long-context EI benchmark with six tasks spanning emotion recognition, knowledge application, and empathetic generation, reaching an average context length of $15{,}341$ tokens. It combines a retrieval-augmented approach with Collaborative Emotional Modeling (CoEM), enabling multi-agent enrichment and emotional ensemble generation to improve EI in extended dialogues. Across extensive experiments, CoEM consistently enhances EI performance over standard RAG, with ablations highlighting the value of multi-agent reasoning and targeted emotional enrichment. The benchmark and framework provide a rigorous, psychology-informed platform for evaluating and advancing EI in long-context LLM interactions, with implications for safer, more coherent, and emotionally intelligent AI systems.
Abstract
Large language models (LLMs) have made significant progress in Emotional Intelligence (EI) and long-context modeling. However, existing benchmarks often overlook the fact that emotional information processing unfolds as a continuous long-context process. To address the absence of multidimensional EI evaluation in long-context inference and explore model performance under more challenging conditions, we present LongEmotion, a benchmark that encompasses a diverse suite of tasks targeting the assessment of models' capabilities in Emotion Recognition, Knowledge Application, and Empathetic Generation, with an average context length of 15,341 tokens. To enhance performance under realistic constraints, we introduce the Collaborative Emotional Modeling (CoEM) framework, which integrates Retrieval-Augmented Generation (RAG) and multi-agent collaboration to improve models' EI in long-context scenarios. We conduct a detailed analysis of various models in long-context settings, investigating how reasoning mode activation, RAG-based retrieval strategies, and context-length adaptability influence their EI performance. Our project page is: https://longemotion.github.io/
