Impact of Large Language Model Assistance on Patients Reading Clinical Notes: A Mixed-Methods Study
Niklas Mannhardt, Elizabeth Bondi-Kelly, Barbara Lam, Hussein Mozannar, Chloe O'Connell, Mercy Asiedu, Alejandro Buendia, Tatiana Urman, Irbaz B. Riaz, Catherine E. Ricciardi, Monica Agrawal, Marzyeh Ghassemi, David Sontag
TL;DR
This mixed-methods study evaluates an end-to-end LLM-assisted tool that augments patient-facing clinical notes with Definitions, Simplification, FAQ, Key Information, and To-do List outputs to improve comprehension among breast cancer patients. Using real and synthetic notes, N=200 survey participants and N=7 interviews, the study shows that Select and All augmentation levels significantly enhance action understanding and self-reported comprehension and confidence, with effects more pronounced for real notes. A detailed error taxonomy and automated readability metrics reveal that while augmentations improve readability, certain definitions and real-note augmentations introduce potentially harmful or misleading errors, underscoring the need for clinician review and cautious deployment. The findings support careful, participatory design of patient-facing AI tools to empower patients while maintaining trust and safety in clinical communication.
Abstract
Large language models (LLMs) have immense potential to make information more accessible, particularly in medicine, where complex medical jargon can hinder patient comprehension of clinical notes. We developed a patient-facing tool using LLMs to make clinical notes more readable by simplifying, extracting information from, and adding context to the notes. We piloted the tool with clinical notes donated by patients with a history of breast cancer and synthetic notes from a clinician. Participants (N=200, healthy, female-identifying patients) were randomly assigned three clinical notes in our tool with varying levels of augmentations and answered quantitative and qualitative questions evaluating their understanding of follow-up actions. Augmentations significantly increased their quantitative understanding scores. In-depth interviews were conducted with participants (N=7, patients with a history of breast cancer), revealing both positive sentiments about the augmentations and concerns about AI. We also performed a qualitative clinician-driven analysis of the model's error modes.
