Emotional Supporters often Use Multiple Strategies in a Single Turn
Xin Bai, Guanyi Chen, Tingting He, Chenlian Zhou, Yu Liu
TL;DR
This work reveals that emotional supporters often deploy multiple strategies within a single turn, a pattern overlooked by prior ESC task formulations. By analyzing the ESConv dataset, the authors redefine the ESC task to generate complete strategy–utterance sequences in a single supporter turn, and preprocess ESConv into variants for training models that explicitly model consecutive strategy usage. They evaluate three supervised approaches (MCC, MTL, CP) and prompt-based LLMs, finding that, under the refined task, LLMs (notably GPT-4o and DeepSeek-R1) outperform both supervised models and human supporters in human evaluations, while CP best captures exact strategy sequences and EMR/LR. These results challenge prior claims about LLM limitations in ESC and demonstrate that task formulation critically shapes model performance, with LLMs showing strong holistic support capabilities, including asking questions and offering concrete suggestions. The work highlights new evaluation metrics (EMR, LR, ALD) tailored to multi-strategy generation and suggests future work in robust, multi-turn, expert-evaluated assessments.
Abstract
Emotional Support Conversations (ESC) are crucial for providing empathy, validation, and actionable guidance to individuals in distress. However, existing definitions of the ESC task oversimplify the structure of supportive responses, typically modelling them as single strategy-utterance pairs. Through a detailed corpus analysis of the ESConv dataset, we identify a common yet previously overlooked phenomenon: emotional supporters often employ multiple strategies consecutively within a single turn. We formally redefine the ESC task to account for this, proposing a revised formulation that requires generating the full sequence of strategy-utterance pairs given a dialogue history. To facilitate this refined task, we introduce several modelling approaches, including supervised deep learning models and large language models. Our experiments show that, under this redefined task, state-of-the-art LLMs outperform both supervised models and human supporters. Notably, contrary to some earlier findings, we observe that LLMs frequently ask questions and provide suggestions, demonstrating more holistic support capabilities.
