Table of Contents
Fetching ...

Chatbot To Help Patients Understand Their Health

Won Seok Jang, Hieu Tran, Manav Mistry, SaiKiran Gandluri, Yifan Zhang, Sharmin Sultana, Sunjae Kown, Yuan Zhang, Zonghai Yao, Hong Yu

TL;DR

The paper addresses patient health literacy by developing NoteAid-Chatbot, a lightweight, multi-agent LLM system trained with a two-stage pipeline: supervised fine-tuning on synthetic Gold/Silver datasets followed by PPO-based RL alignment that rewards patient comprehension in discharge scenarios. It demonstrates that simple RL signals can yield emergent, concise, and education-focused dialogue, with NoteAid-Chatbot outperforming baselines in generation quality, content coverage, and conversation strategy while achieving near-human performance in Turing-like evaluation. The work showcases the feasibility of low-cost, open-domain RL methods for patient education and highlights the importance of evaluative frameworks (LLM-as-a-judge, case studies) for alignment in safety-sensitive domains. It also discusses ethical considerations and limitations, including hallucination risks and the need for human-in-the-loop safeguards in clinical contexts, and suggests future directions like more robust evaluation and test-time optimization.

Abstract

Patients must possess the knowledge necessary to actively participate in their care. We present NoteAid-Chatbot, a conversational AI that promotes patient understanding via a novel 'learning as conversation' framework, built on a multi-agent large language model (LLM) and reinforcement learning (RL) setup without human-labeled data. NoteAid-Chatbot was built on a lightweight LLaMA 3.2 3B model trained in two stages: initial supervised fine-tuning on conversational data synthetically generated using medical conversation strategies, followed by RL with rewards derived from patient understanding assessments in simulated hospital discharge scenarios. Our evaluation, which includes comprehensive human-aligned assessments and case studies, demonstrates that NoteAid-Chatbot exhibits key emergent behaviors critical for patient education, such as clarity, relevance, and structured dialogue, even though it received no explicit supervision for these attributes. Our results show that even simple Proximal Policy Optimization (PPO)-based reward modeling can successfully train lightweight, domain-specific chatbots to handle multi-turn interactions, incorporate diverse educational strategies, and meet nuanced communication objectives. Our Turing test demonstrates that NoteAid-Chatbot surpasses non-expert human. Although our current focus is on healthcare, the framework we present illustrates the feasibility and promise of applying low-cost, PPO-based RL to realistic, open-ended conversational domains, broadening the applicability of RL-based alignment methods.

Chatbot To Help Patients Understand Their Health

TL;DR

The paper addresses patient health literacy by developing NoteAid-Chatbot, a lightweight, multi-agent LLM system trained with a two-stage pipeline: supervised fine-tuning on synthetic Gold/Silver datasets followed by PPO-based RL alignment that rewards patient comprehension in discharge scenarios. It demonstrates that simple RL signals can yield emergent, concise, and education-focused dialogue, with NoteAid-Chatbot outperforming baselines in generation quality, content coverage, and conversation strategy while achieving near-human performance in Turing-like evaluation. The work showcases the feasibility of low-cost, open-domain RL methods for patient education and highlights the importance of evaluative frameworks (LLM-as-a-judge, case studies) for alignment in safety-sensitive domains. It also discusses ethical considerations and limitations, including hallucination risks and the need for human-in-the-loop safeguards in clinical contexts, and suggests future directions like more robust evaluation and test-time optimization.

Abstract

Patients must possess the knowledge necessary to actively participate in their care. We present NoteAid-Chatbot, a conversational AI that promotes patient understanding via a novel 'learning as conversation' framework, built on a multi-agent large language model (LLM) and reinforcement learning (RL) setup without human-labeled data. NoteAid-Chatbot was built on a lightweight LLaMA 3.2 3B model trained in two stages: initial supervised fine-tuning on conversational data synthetically generated using medical conversation strategies, followed by RL with rewards derived from patient understanding assessments in simulated hospital discharge scenarios. Our evaluation, which includes comprehensive human-aligned assessments and case studies, demonstrates that NoteAid-Chatbot exhibits key emergent behaviors critical for patient education, such as clarity, relevance, and structured dialogue, even though it received no explicit supervision for these attributes. Our results show that even simple Proximal Policy Optimization (PPO)-based reward modeling can successfully train lightweight, domain-specific chatbots to handle multi-turn interactions, incorporate diverse educational strategies, and meet nuanced communication objectives. Our Turing test demonstrates that NoteAid-Chatbot surpasses non-expert human. Although our current focus is on healthcare, the framework we present illustrates the feasibility and promise of applying low-cost, PPO-based RL to realistic, open-ended conversational domains, broadening the applicability of RL-based alignment methods.

Paper Structure

This paper contains 41 sections, 5 equations, 16 figures, 10 tables.

Figures (16)

  • Figure 1: Overview of our multi-agent framework and interactive patient education experiment. (Left: Model development) The NoteAid-Chatbot training pipeline. We first construct a two datasets: 1) Gold dataset that consists real-world EHR notes and questionnaires annotated by experts, 2) Silver dataset which is synthetic dataset (EHR notes, Conversation records, questionnaires) generated using six medical content criteria and medical conversation strategy. We apply supervised fine-tuning on this conversation dataset to build a baseline chatbot model. Leveraging the Silver dataset, we align the chatbot via reinforcement learning (PPO), where the Chatbot NoteAid interacts with the patient agent (GPT-4o-mini) and receives verifiable reward signals based on the patient's performance on the comprehension test. This two-stage alignment enables emergent instructional behaviors in SLMs. (Right: Evaluation) We evaluate NoteAid-Chatbot with the Gold comprehension dataset and conduct general evaluation and a turing test. Above illustrates the generation evaluation based on the simulation with a virtual patient simulated from gold and silver dataset. We evaluated medical content generation and medical conversational strategies of our model. Below illustrates the NoteAid-Chatbot in turing test. The NoteAid-Chatbot poses questions derived from a patient's discharge note to improve their understanding through interactive question answering. At the end of the session, the patient completes an exam assessing comprehension, which serves as the measurable learning outcome.
  • Figure 2: As the RL-based alignment training progress, the comprehension score increases while the FKGL score of the text decreases. We also see an increase in Medical Content score, Medical conversation strategy score. While the mean token length generated decreases in each training steps during reinforcement learning stage.
  • Figure 3: We compare the responses generated by NoteAid-Chatbot with those of the supervised fine-tuned LLaMA 3.2-3B-Instruct model. For each question posed, NoteAid-Chatbot consistently conveys equivalent content in a more concise and efficient manner.
  • Figure 4: Guidelines for initial questionnaire generation for $Q$
  • Figure 5: Guidelines for questionnaire modification for $Q$
  • ...and 11 more figures