RECAP: Transparent Inference-Time Emotion Alignment for Medical Dialogue Systems
Adarsh Srinivasan, Jacob Dineen, Muhammad Umar Afzal, Muhammad Uzair Sarfraz, Irbaz B. Riaz, Ben Zhou
TL;DR
RECAP introduces a model-agnostic, inference-time five-stage framework (Reflect-Extract-Calibrate-Align-Produce) that makes emotional reasoning in medical dialogue explicit and auditable. By grounding reasoning in appraisal theory and quantifying emotion likelihoods with Likert scales, RECAP improves empathetic alignment without retraining, showing consistent gains across standardized emotion benchmarks and domain-specific health dialogues. Human clinician evaluations confirm enhanced empathy, personalization, and context sensitivity, while automated judgments reveal both opportunities and biases in LLM-based scoring. The work demonstrates that structured, interpretable emotional reasoning can advance patient-centered clinical AI with transparent oversight. Limitations include computational overhead, potential error propagation, and cultural or multimodal constraints, suggesting future integration with decision-support and longitudinal tracking.
Abstract
Large language models in healthcare often miss critical emotional cues, delivering medically sound but emotionally flat advice. Such responses are insufficient in clinical encounters, where distressed or vulnerable patients rely on empathic communication to support safety, adherence, and trust. We present RECAP (Reflect-Extract-Calibrate-Align-Produce), an inference-time framework that guides models through structured emotional reasoning without retraining. RECAP decomposes patient input into appraisal-theoretic stages, identifies psychological factors, and assigns Likert-based emotion likelihoods that clinicians can inspect or override, producing nuanced and auditable responses. Across EmoBench, SECEU, and EQ-Bench, RECAP improves emotional reasoning by 22-28% on 8B models and 10-13% on larger models over zero-shot baselines. In blinded evaluations, oncology clinicians rated RECAP's responses as more empathetic, supportive, and context-appropriate than prompting baselines. These findings demonstrate that modular, principled prompting can enhance emotional intelligence in medical AI while maintaining transparency and accountability for clinical deployment.
