Table of Contents
Fetching ...

From Generic Empathy to Personalized Emotional Support: A Self-Evolution Framework for User Preference Alignment

Jing Ye, Lu Xiang, Yaping Zhang, Chengqing Zong

TL;DR

The paper tackles the problem of generic empathy in emotional support conversations by introducing a self-evolution framework that learns and adapts to users' implicit preferences. It deploys a two-phase approach: Emotional Support Experience Acquisition to establish basic ES capabilities via LoRA-based fine-tuning, and Self-Improvement for Personalized Emotional Support to induce personalization through self-reflection, synthetic preference data, and direct preference optimization. The method yields M^0, M^1, and M^2 models, with extensive objective and subjective evaluations showing improved diversity, coherence, and alignment with user needs across multiple backbones and ESC datasets. This work advances practical, personalized ESC systems for multi-turn interactions, with implications for mental health support, companionship, and customer service.

Abstract

Effective emotional support hinges on understanding users' emotions and needs to provide meaningful comfort during multi-turn interactions. Large Language Models (LLMs) show great potential for expressing empathy; however, they often deliver generic and one-size-fits-all responses that fail to address users' specific needs. To tackle this issue, we propose a self-evolution framework designed to help LLMs improve their responses to better align with users' implicit preferences concerning user profiles (personalities), emotional states, and specific situations. Our framework consists of two distinct phases: \textit{(1)} \textit{Emotional Support Experience Acquisition}, where LLMs are fine-tuned on limited emotional support conversation data to provide basic support, and \textit{(2)} \textit{Self-Improvement for Personalized Emotional Support}, where LLMs leverage self-reflection and self-refinement to generate personalized responses. Through iterative direct preference optimization between the pre- and post-refined responses, our model generates responses that reflect a better understanding of the user's implicit preferences. Extensive experiments and evaluations demonstrate that our method significantly enhances the model's performance in emotional support, reducing unhelpful responses and minimizing discrepancies between user preferences and model outputs.

From Generic Empathy to Personalized Emotional Support: A Self-Evolution Framework for User Preference Alignment

TL;DR

The paper tackles the problem of generic empathy in emotional support conversations by introducing a self-evolution framework that learns and adapts to users' implicit preferences. It deploys a two-phase approach: Emotional Support Experience Acquisition to establish basic ES capabilities via LoRA-based fine-tuning, and Self-Improvement for Personalized Emotional Support to induce personalization through self-reflection, synthetic preference data, and direct preference optimization. The method yields M^0, M^1, and M^2 models, with extensive objective and subjective evaluations showing improved diversity, coherence, and alignment with user needs across multiple backbones and ESC datasets. This work advances practical, personalized ESC systems for multi-turn interactions, with implications for mental health support, companionship, and customer service.

Abstract

Effective emotional support hinges on understanding users' emotions and needs to provide meaningful comfort during multi-turn interactions. Large Language Models (LLMs) show great potential for expressing empathy; however, they often deliver generic and one-size-fits-all responses that fail to address users' specific needs. To tackle this issue, we propose a self-evolution framework designed to help LLMs improve their responses to better align with users' implicit preferences concerning user profiles (personalities), emotional states, and specific situations. Our framework consists of two distinct phases: \textit{(1)} \textit{Emotional Support Experience Acquisition}, where LLMs are fine-tuned on limited emotional support conversation data to provide basic support, and \textit{(2)} \textit{Self-Improvement for Personalized Emotional Support}, where LLMs leverage self-reflection and self-refinement to generate personalized responses. Through iterative direct preference optimization between the pre- and post-refined responses, our model generates responses that reflect a better understanding of the user's implicit preferences. Extensive experiments and evaluations demonstrate that our method significantly enhances the model's performance in emotional support, reducing unhelpful responses and minimizing discrepancies between user preferences and model outputs.

Paper Structure

This paper contains 44 sections, 8 equations, 13 figures, 7 tables.

Figures (13)

  • Figure 1: Example responses. Direct prompting of LLaMA results in verbose and formulaic outputs. Task-Specific SFT is empathetic but often lacks depth and variety, giving it a perceived "AI-like" quality. In contrast, self-reflection on user preferences provides a pathway to more specific and engaging responses.
  • Figure 2: The overview of our self-evolution framework, which enhances personalized emotional support capabilities through a two-stage learning phase: (1) Emotional Support Experience Acquisition: We fine-tune LLMs on minimal human-annotated ESC data, equipping them with basic emotional support capability. (2) Self-Improvement for Personalized Emotional Support: We utilize the LLMs' self-reflection abilities to tailor responses to the user's personality, situation, and emotions. The pre- and post-refined responses are natural synthetic preference data. The process involves iterative preference optimization for generating responses that align with the user's implicit preferences, eliminating the need for explicit reflection steps.
  • Figure 3: Interactive pointwise human evaluation results. The results demonstrate that our self-evolution framework significantly enhances user experience, with $\mathcal{M}^1$ and $\mathcal{M}^2$ showing notable improvements in engagement, helpfulness, and informativeness.
  • Figure 4: Interactive pairwise human evaluation results obtained using LLaMA-3-8B-Instruct as the backbone model. In the 'A vs B' comparisons, $\blacksquare$ indicates 'A win', $\blacksquare$ indicates 'tie', and $\blacksquare$ indicates 'B win'. Notably, $\mathcal{M}^2$ and $\mathcal{M}^1$ excel over $\mathcal{M}^0$, suggesting the effectiveness of implicit user preference learning.
  • Figure 5: (a) Distribution of response relevance to user statements in the dialogue history. The higher relevance to the user in chosen responses indicates that self-reflection on the user's situations and implicit preferences improves response quality. (b) Similarity distribution between chosen and rejected responses across different iterations.
  • ...and 8 more figures