Harnessing the Power of Large Language Models for Empathetic Response Generation: Empirical Investigations and Improvements
Yushan Qian, Wei-Nan Zhang, Ting Liu
TL;DR
The paper demonstrates that large language models can significantly improve empathetic response generation in dialogues, surpassing state-of-the-art baselines. It introduces three targeted improvements—semantically similar in-context learning, two-stage interactive generation, and knowledge-base augmentation using a commonsense graph via COMET—and validates them with extensive automatic and human evaluations. Additionally, the study explores GPT-4 as a surrogate evaluator, finding meaningful correlations with human judgments. The work advances practical empathetic dialogue systems and provides insights into efficient evaluation and knowledge integration for LLM-based generation.
Abstract
Empathetic dialogue is an indispensable part of building harmonious social relationships and contributes to the development of a helpful AI. Previous approaches are mainly based on fine small-scale language models. With the advent of ChatGPT, the application effect of large language models (LLMs) in this field has attracted great attention. This work empirically investigates the performance of LLMs in generating empathetic responses and proposes three improvement methods of semantically similar in-context learning, two-stage interactive generation, and combination with the knowledge base. Extensive experiments show that LLMs can significantly benefit from our proposed methods and is able to achieve state-of-the-art performance in both automatic and human evaluations. Additionally, we explore the possibility of GPT-4 simulating human evaluators.
