From Generic Empathy to Personalized Emotional Support: A Self-Evolution Framework for User Preference Alignment
Jing Ye, Lu Xiang, Yaping Zhang, Chengqing Zong
TL;DR
The paper tackles the problem of generic empathy in emotional support conversations by introducing a self-evolution framework that learns and adapts to users' implicit preferences. It deploys a two-phase approach: Emotional Support Experience Acquisition to establish basic ES capabilities via LoRA-based fine-tuning, and Self-Improvement for Personalized Emotional Support to induce personalization through self-reflection, synthetic preference data, and direct preference optimization. The method yields M^0, M^1, and M^2 models, with extensive objective and subjective evaluations showing improved diversity, coherence, and alignment with user needs across multiple backbones and ESC datasets. This work advances practical, personalized ESC systems for multi-turn interactions, with implications for mental health support, companionship, and customer service.
Abstract
Effective emotional support hinges on understanding users' emotions and needs to provide meaningful comfort during multi-turn interactions. Large Language Models (LLMs) show great potential for expressing empathy; however, they often deliver generic and one-size-fits-all responses that fail to address users' specific needs. To tackle this issue, we propose a self-evolution framework designed to help LLMs improve their responses to better align with users' implicit preferences concerning user profiles (personalities), emotional states, and specific situations. Our framework consists of two distinct phases: \textit{(1)} \textit{Emotional Support Experience Acquisition}, where LLMs are fine-tuned on limited emotional support conversation data to provide basic support, and \textit{(2)} \textit{Self-Improvement for Personalized Emotional Support}, where LLMs leverage self-reflection and self-refinement to generate personalized responses. Through iterative direct preference optimization between the pre- and post-refined responses, our model generates responses that reflect a better understanding of the user's implicit preferences. Extensive experiments and evaluations demonstrate that our method significantly enhances the model's performance in emotional support, reducing unhelpful responses and minimizing discrepancies between user preferences and model outputs.
