Efficient-Empathy: Towards Efficient and Effective Selection of Empathy Data
Linzhuang Sun, Hao Liang, Jingxuan Wei, Linkun Sun, Bihui Yu, Bin Cui, Wentao Zhang
TL;DR
Efficient-Empathy addresses data efficiency and robustness in empathetic dialogue modeling by introducing LLM-derived sensibility and rationality scores to filter training data. Training on a sensibility-focused subset (59% of the full ED data) yields SoTA empathetic performance, and incorporating both sensibility and rationality through a Mixture-of-Experts further improves results, validated by automatic metrics and human evaluations. The approach demonstrates robustness across varying data-selection thresholds and reduces computational costs while preserving high-quality empathetic responses. This data-centric methodology offers a practical path to scalable, high-quality empathetic AI systems with real-world impact in human-centered AI applications.
Abstract
In recent years, with the rapid advancements in large language models (LLMs), achieving excellent empathetic response capability has become a crucial prerequisite. Consequently, managing and understanding large-scale video datasets has gained increasing importance. However, empathetic data are typically trained without any quality selection, leading to inefficient data usage and wasted computational resources. Additionally, using raw data can result in low performance in empathetic dialogues. In this work, we present Efficient-Empathy, a sensibility and rationality score-based data selection algorithm that automatically selects sensibility and rationality data while discarding low-quality data. With only the sensibility data (59% of the full dataset), our trained sensibility model efficiently achieves state-of-the-art (SoTA) performance. Furthermore, with multiple data selection hyperparameters, the sensibility model demonstrates SoTA performance, showcasing the robustness of our method. By integrating sensibility and rationality data with a MoE structure, we achieve even higher performance, demonstrating the effectiveness of our Efficient-Empathy algorithm.
