Table of Contents
Fetching ...

Efficient-Empathy: Towards Efficient and Effective Selection of Empathy Data

Linzhuang Sun, Hao Liang, Jingxuan Wei, Linkun Sun, Bihui Yu, Bin Cui, Wentao Zhang

TL;DR

Efficient-Empathy addresses data efficiency and robustness in empathetic dialogue modeling by introducing LLM-derived sensibility and rationality scores to filter training data. Training on a sensibility-focused subset (59% of the full ED data) yields SoTA empathetic performance, and incorporating both sensibility and rationality through a Mixture-of-Experts further improves results, validated by automatic metrics and human evaluations. The approach demonstrates robustness across varying data-selection thresholds and reduces computational costs while preserving high-quality empathetic responses. This data-centric methodology offers a practical path to scalable, high-quality empathetic AI systems with real-world impact in human-centered AI applications.

Abstract

In recent years, with the rapid advancements in large language models (LLMs), achieving excellent empathetic response capability has become a crucial prerequisite. Consequently, managing and understanding large-scale video datasets has gained increasing importance. However, empathetic data are typically trained without any quality selection, leading to inefficient data usage and wasted computational resources. Additionally, using raw data can result in low performance in empathetic dialogues. In this work, we present Efficient-Empathy, a sensibility and rationality score-based data selection algorithm that automatically selects sensibility and rationality data while discarding low-quality data. With only the sensibility data (59% of the full dataset), our trained sensibility model efficiently achieves state-of-the-art (SoTA) performance. Furthermore, with multiple data selection hyperparameters, the sensibility model demonstrates SoTA performance, showcasing the robustness of our method. By integrating sensibility and rationality data with a MoE structure, we achieve even higher performance, demonstrating the effectiveness of our Efficient-Empathy algorithm.

Efficient-Empathy: Towards Efficient and Effective Selection of Empathy Data

TL;DR

Efficient-Empathy addresses data efficiency and robustness in empathetic dialogue modeling by introducing LLM-derived sensibility and rationality scores to filter training data. Training on a sensibility-focused subset (59% of the full ED data) yields SoTA empathetic performance, and incorporating both sensibility and rationality through a Mixture-of-Experts further improves results, validated by automatic metrics and human evaluations. The approach demonstrates robustness across varying data-selection thresholds and reduces computational costs while preserving high-quality empathetic responses. This data-centric methodology offers a practical path to scalable, high-quality empathetic AI systems with real-world impact in human-centered AI applications.

Abstract

In recent years, with the rapid advancements in large language models (LLMs), achieving excellent empathetic response capability has become a crucial prerequisite. Consequently, managing and understanding large-scale video datasets has gained increasing importance. However, empathetic data are typically trained without any quality selection, leading to inefficient data usage and wasted computational resources. Additionally, using raw data can result in low performance in empathetic dialogues. In this work, we present Efficient-Empathy, a sensibility and rationality score-based data selection algorithm that automatically selects sensibility and rationality data while discarding low-quality data. With only the sensibility data (59% of the full dataset), our trained sensibility model efficiently achieves state-of-the-art (SoTA) performance. Furthermore, with multiple data selection hyperparameters, the sensibility model demonstrates SoTA performance, showcasing the robustness of our method. By integrating sensibility and rationality data with a MoE structure, we achieve even higher performance, demonstrating the effectiveness of our Efficient-Empathy algorithm.
Paper Structure (30 sections, 10 equations, 8 figures, 7 tables, 2 algorithms)

This paper contains 30 sections, 10 equations, 8 figures, 7 tables, 2 algorithms.

Figures (8)

  • Figure 1: The pipeline of our approach (a) The data selection method utilized for classifying sensibility and rationality conversation. (b) Utilize sensibility and rationality data for MoE training
  • Figure 2: Comparison of Empathetic Responses from Different Models. Sensibility, Rationality, and MoE models are trained using data selected by Efficient-Empathy.
  • Figure 3: The overall pipeline of Efficient-Empathy consists of three parts: (a) the Data Selection Module, which classifies the empathetic dataset into sensibility, rationality, and discard datasets; (b) the Domain Expert Training Module, which uses the selected datasets to fine-tune LLMs and acquire sensibility and rationality experts; and (c) the Expert Mixing Module, which integrates the sensibility and rationality experts into the MoE empathy model.
  • Figure 4: 2D Histogram of Rationality and Sensibility Scores. The x-axis represents rationality scores, the y-axis represents sensibility scores, and the color intensity indicates the frequency of each combination of scores.
  • Figure 5: Meticulously designed prompts for Data Evaluation and empathetic response generation.
  • ...and 3 more figures