Conversational Assistants to support Heart Failure Patients: comparing a Neurosymbolic Architecture with ChatGPT
Anuja Tayal, Devika Salunke, Barbara Di Eugenio, Paula Allen-Meares, Eulalia Puig Abril, Olga Garcia, Carolyn Dickens, Andrew Boyd
TL;DR
This study directly compares a neurosymbolic, task-focused dialogue system (HFFood-NS) with a GPT-4–based dialog (HFFood-GPT) for answering heart failure patients’ questions about salt content in foods. Using a within-subject design with $n=20$ hospitalized African-American patients, intrinsic metrics (task completion, slot accuracy, WER) and extrinsic measures (post-survey perceptions) reveal that HFFood-NS achieves higher task completion and more concise responses, while HFFood-GPT yields fewer speech errors and fewer clarifications. The results highlight distinct trade-offs: precision and reliability from a controlled neuro-symbolic approach versus conversational fluency and adaptability from a GPT-based system. The authors advocate exploring hybrid architectures that leverage the strengths of both paradigms to support patient-centered dietary guidance in clinical settings.
Abstract
Conversational assistants are becoming more and more popular, including in healthcare, partly because of the availability and capabilities of Large Language Models. There is a need for controlled, probing evaluations with real stakeholders which can highlight advantages and disadvantages of more traditional architectures and those based on generative AI. We present a within-group user study to compare two versions of a conversational assistant that allows heart failure patients to ask about salt content in food. One version of the system was developed in-house with a neurosymbolic architecture, and one is based on ChatGPT. The evaluation shows that the in-house system is more accurate, completes more tasks and is less verbose than the one based on ChatGPT; on the other hand, the one based on ChatGPT makes fewer speech errors and requires fewer clarifications to complete the task. Patients show no preference for one over the other.
