Empathy Level Alignment via Reinforcement Learning for Empathetic Response Generation
Hui Ma, Bo Zhang, Bo Xu, Jian Wang, Hongfei Lin, Xiao Sun
TL;DR
Empathy level alignment in empathetic response generation is addressed by casting the task as sequential decision-making optimized with reinforcement learning. The authors introduce EmpRL, which uses a fine-tuned T5 generator to initialize a policy, an empathy identifier to supply an empathy-aware reward based on three communication mechanisms, and a KL penalty to constrain policy updates, all trained with proximal policy optimization. Automatic and human evaluations on EmpatheticDialogues and PEC demonstrate that EmpRL improves empathy-level alignment (Emp-F1) and overall response quality, capturing both affective and cognitive dimensions. The work provides a practical approach to endowing dialogue systems with nuanced empathy and points to future extensions with larger models, multi-turn rewards, and retrieval-augmented generation.
Abstract
Empathetic response generation, aiming to understand the user's situation and feelings and respond empathically, is crucial in building human-like dialogue systems. Traditional approaches typically employ maximum likelihood estimation as the optimization objective during training, yet fail to align the empathy levels between generated and target responses. To this end, we propose an empathetic response generation framework using reinforcement learning (EmpRL). The framework develops an effective empathy reward function and generates empathetic responses by maximizing the expected reward through reinforcement learning. EmpRL utilizes the pre-trained T5 model as the generator and further fine-tunes it to initialize the policy. To align the empathy levels between generated and target responses within a given context, an empathy reward function containing three empathy communication mechanisms -- emotional reaction, interpretation, and exploration -- is constructed using pre-designed and pre-trained empathy identifiers. During reinforcement learning training, the proximal policy optimization algorithm is used to fine-tune the policy, enabling the generation of empathetic responses. Both automatic and human evaluations demonstrate that the proposed EmpRL framework significantly improves the quality of generated responses, enhances the similarity in empathy levels between generated and target responses, and produces empathetic responses covering both affective and cognitive aspects.
