Table of Contents
Fetching ...

RECAP: Transparent Inference-Time Emotion Alignment for Medical Dialogue Systems

Adarsh Srinivasan, Jacob Dineen, Muhammad Umar Afzal, Muhammad Uzair Sarfraz, Irbaz B. Riaz, Ben Zhou

TL;DR

RECAP introduces a model-agnostic, inference-time five-stage framework (Reflect-Extract-Calibrate-Align-Produce) that makes emotional reasoning in medical dialogue explicit and auditable. By grounding reasoning in appraisal theory and quantifying emotion likelihoods with Likert scales, RECAP improves empathetic alignment without retraining, showing consistent gains across standardized emotion benchmarks and domain-specific health dialogues. Human clinician evaluations confirm enhanced empathy, personalization, and context sensitivity, while automated judgments reveal both opportunities and biases in LLM-based scoring. The work demonstrates that structured, interpretable emotional reasoning can advance patient-centered clinical AI with transparent oversight. Limitations include computational overhead, potential error propagation, and cultural or multimodal constraints, suggesting future integration with decision-support and longitudinal tracking.

Abstract

Large language models in healthcare often miss critical emotional cues, delivering medically sound but emotionally flat advice. Such responses are insufficient in clinical encounters, where distressed or vulnerable patients rely on empathic communication to support safety, adherence, and trust. We present RECAP (Reflect-Extract-Calibrate-Align-Produce), an inference-time framework that guides models through structured emotional reasoning without retraining. RECAP decomposes patient input into appraisal-theoretic stages, identifies psychological factors, and assigns Likert-based emotion likelihoods that clinicians can inspect or override, producing nuanced and auditable responses. Across EmoBench, SECEU, and EQ-Bench, RECAP improves emotional reasoning by 22-28% on 8B models and 10-13% on larger models over zero-shot baselines. In blinded evaluations, oncology clinicians rated RECAP's responses as more empathetic, supportive, and context-appropriate than prompting baselines. These findings demonstrate that modular, principled prompting can enhance emotional intelligence in medical AI while maintaining transparency and accountability for clinical deployment.

RECAP: Transparent Inference-Time Emotion Alignment for Medical Dialogue Systems

TL;DR

RECAP introduces a model-agnostic, inference-time five-stage framework (Reflect-Extract-Calibrate-Align-Produce) that makes emotional reasoning in medical dialogue explicit and auditable. By grounding reasoning in appraisal theory and quantifying emotion likelihoods with Likert scales, RECAP improves empathetic alignment without retraining, showing consistent gains across standardized emotion benchmarks and domain-specific health dialogues. Human clinician evaluations confirm enhanced empathy, personalization, and context sensitivity, while automated judgments reveal both opportunities and biases in LLM-based scoring. The work demonstrates that structured, interpretable emotional reasoning can advance patient-centered clinical AI with transparent oversight. Limitations include computational overhead, potential error propagation, and cultural or multimodal constraints, suggesting future integration with decision-support and longitudinal tracking.

Abstract

Large language models in healthcare often miss critical emotional cues, delivering medically sound but emotionally flat advice. Such responses are insufficient in clinical encounters, where distressed or vulnerable patients rely on empathic communication to support safety, adherence, and trust. We present RECAP (Reflect-Extract-Calibrate-Align-Produce), an inference-time framework that guides models through structured emotional reasoning without retraining. RECAP decomposes patient input into appraisal-theoretic stages, identifies psychological factors, and assigns Likert-based emotion likelihoods that clinicians can inspect or override, producing nuanced and auditable responses. Across EmoBench, SECEU, and EQ-Bench, RECAP improves emotional reasoning by 22-28% on 8B models and 10-13% on larger models over zero-shot baselines. In blinded evaluations, oncology clinicians rated RECAP's responses as more empathetic, supportive, and context-appropriate than prompting baselines. These findings demonstrate that modular, principled prompting can enhance emotional intelligence in medical AI while maintaining transparency and accountability for clinical deployment.

Paper Structure

This paper contains 45 sections, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Patient input (left) is transformed into appraisal-theoretic intermediates with per-dimension Likert ratings (center), which condition generation to produce a targeted reply (right). Compared with a vanilla instruction-tuned model, RECAP explicitly acknowledges the patient’s emotions and provides concrete next steps. (Scenario text is synthetic for illustration.)
  • Figure 2: RECAP Pipeline for Emotional Alignment. Model-agnostic inference-time prompting that externalizes emotional reasoning into interpretable stages: (1) abstraction, (2) factor identification, (3) emotion enumeration, (4) Likert assessment, and (5) aligned response generation.
  • Figure 3: Representative synthetic patient scenarios. (a) Single-turn evaluation assesses individual response quality. (b) Multi-turn evaluation tracks conversational dynamics across 3 turns as the patient persona responds to system output.
  • Figure 4: Human evaluation results. (a,b) Mean ratings with standard error bars (1--5 scale). (c) Scenario-level win rates showing percentage where each method achieved higher average rating.
  • Figure 5: Scenario quality distribution
  • ...and 4 more figures