Table of Contents
Fetching ...

CTSM: Combining Trait and State Emotions for Empathetic Response Model

Wang Yufeng, Chen Chao, Yang Zhou, Wang Shuhui, Liao Xiangwen

TL;DR

CTSM tackles empathetic response generation by jointly modeling static trait emotions and dynamic state emotions at the token level. It introduces an emotion encoding module to construct trait and state embeddings, an emotion guidance module with teacher–student soft-label learning, and a cross-contrastive learning decoder to align generated responses with contextual emotions. Experiments on EmpatheticDialogues show CTSM outperforms state-of-the-art baselines in emotion accuracy, response diversity, and human-perceived empathy, relevance, and fluency. The work highlights the importance of integrating trait and state emotions for more accurate emotional perception and more natural, empathetic dialogue generation, with practical implications for advanced empathetic AI systems.

Abstract

Empathetic response generation endeavors to empower dialogue systems to perceive speakers' emotions and generate empathetic responses accordingly. Psychological research demonstrates that emotion, as an essential factor in empathy, encompasses trait emotions, which are static and context-independent, and state emotions, which are dynamic and context-dependent. However, previous studies treat them in isolation, leading to insufficient emotional perception of the context, and subsequently, less effective empathetic expression. To address this problem, we propose Combining Trait and State emotions for Empathetic Response Model (CTSM). Specifically, to sufficiently perceive emotions in dialogue, we first construct and encode trait and state emotion embeddings, and then we further enhance emotional perception capability through an emotion guidance module that guides emotion representation. In addition, we propose a cross-contrastive learning decoder to enhance the model's empathetic expression capability by aligning trait and state emotions between generated responses and contexts. Both automatic and manual evaluation results demonstrate that CTSM outperforms state-of-the-art baselines and can generate more empathetic responses. Our code is available at https://github.com/wangyufeng-empty/CTSM

CTSM: Combining Trait and State Emotions for Empathetic Response Model

TL;DR

CTSM tackles empathetic response generation by jointly modeling static trait emotions and dynamic state emotions at the token level. It introduces an emotion encoding module to construct trait and state embeddings, an emotion guidance module with teacher–student soft-label learning, and a cross-contrastive learning decoder to align generated responses with contextual emotions. Experiments on EmpatheticDialogues show CTSM outperforms state-of-the-art baselines in emotion accuracy, response diversity, and human-perceived empathy, relevance, and fluency. The work highlights the importance of integrating trait and state emotions for more accurate emotional perception and more natural, empathetic dialogue generation, with practical implications for advanced empathetic AI systems.

Abstract

Empathetic response generation endeavors to empower dialogue systems to perceive speakers' emotions and generate empathetic responses accordingly. Psychological research demonstrates that emotion, as an essential factor in empathy, encompasses trait emotions, which are static and context-independent, and state emotions, which are dynamic and context-dependent. However, previous studies treat them in isolation, leading to insufficient emotional perception of the context, and subsequently, less effective empathetic expression. To address this problem, we propose Combining Trait and State emotions for Empathetic Response Model (CTSM). Specifically, to sufficiently perceive emotions in dialogue, we first construct and encode trait and state emotion embeddings, and then we further enhance emotional perception capability through an emotion guidance module that guides emotion representation. In addition, we propose a cross-contrastive learning decoder to enhance the model's empathetic expression capability by aligning trait and state emotions between generated responses and contexts. Both automatic and manual evaluation results demonstrate that CTSM outperforms state-of-the-art baselines and can generate more empathetic responses. Our code is available at https://github.com/wangyufeng-empty/CTSM
Paper Structure (31 sections, 25 equations, 3 figures, 4 tables)

This paper contains 31 sections, 25 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: An example of trait and state emotions. (a) The bar chart illustrates the context-independent trait emotions of excited. (b) Two sample contexts contain the term excited. (c) The heat map displays the varying state emotions of excited across two different contexts, with warmer colors indicating stronger emotion inclination and cooler colors denoting lesser emotional intensity.
  • Figure 2: An overview architecture of CTSM. It consists of four parts: 1) Inference-Enriched Context Encoder; 2) Emotion Encoding Module; 3) Emotion Guidance Module; 4) Cross-Contrastive Learning Decoder.
  • Figure 3: Visualization of trait and state emotion polarities in the VAD space. When a word's trait and state emotional polarities align, they overlap; otherwise, an offset occurs (highlighted in blue).