Table of Contents
Fetching ...

Editing with AI: How Doctors Refine LLM-Generated Answers to Patient Queries

Rahul Sharma, Pragnya Ramjee, Kaushik Murali, Mohit Jain

TL;DR

This work investigates how physicians refine LLM-generated answers to patient questions in ophthalmology, focusing on cataract surgery. Through a mixed-methods study with nine doctors across three co-authoring modes (Write, Edit, Instruct), it shows that while LLM drafts are generally accurate, human contextualization and oversight are essential to avoid errors and automation bias. The findings reveal that direct or instruction-based editing improves quality but affects workload and introduces distinct risks, underscoring the need for hybrid workflows, contextual grounding, and governance to achieve safe, scalable AI-assisted clinical communication. The study provides concrete design and policy recommendations to balance standardization with personalization in patient education, with implications for broader high-stakes medical domains.

Abstract

Patients frequently seek information during their medical journeys, but the rising volume of digital patient messages has strained healthcare systems. Large language models (LLMs) offer promise in generating draft responses for clinicians, yet how physicians refine these drafts remains underexplored. We present a mixed-methods study with nine ophthalmologists answering 144 cataract surgery questions across three conditions: writing from scratch, directly editing LLM drafts, and instruction-based indirect editing. Our quantitative and qualitative analyses reveal that while LLM outputs were generally accurate, occasional errors and automation bias revealed the need for human oversight. Contextualization--adapting generic answers to local practices and patient expectations--emerged as a dominant form of editing. Editing workflows revealed trade-offs: indirect editing reduced effort but introduced errors, while direct editing ensured precision but with higher workload. We conclude with design and policy implications for building safe, scalable LLM-assisted clinical communication systems.

Editing with AI: How Doctors Refine LLM-Generated Answers to Patient Queries

TL;DR

This work investigates how physicians refine LLM-generated answers to patient questions in ophthalmology, focusing on cataract surgery. Through a mixed-methods study with nine doctors across three co-authoring modes (Write, Edit, Instruct), it shows that while LLM drafts are generally accurate, human contextualization and oversight are essential to avoid errors and automation bias. The findings reveal that direct or instruction-based editing improves quality but affects workload and introduces distinct risks, underscoring the need for hybrid workflows, contextual grounding, and governance to achieve safe, scalable AI-assisted clinical communication. The study provides concrete design and policy recommendations to balance standardization with personalization in patient education, with implications for broader high-stakes medical domains.

Abstract

Patients frequently seek information during their medical journeys, but the rising volume of digital patient messages has strained healthcare systems. Large language models (LLMs) offer promise in generating draft responses for clinicians, yet how physicians refine these drafts remains underexplored. We present a mixed-methods study with nine ophthalmologists answering 144 cataract surgery questions across three conditions: writing from scratch, directly editing LLM drafts, and instruction-based indirect editing. Our quantitative and qualitative analyses reveal that while LLM outputs were generally accurate, occasional errors and automation bias revealed the need for human oversight. Contextualization--adapting generic answers to local practices and patient expectations--emerged as a dominant form of editing. Editing workflows revealed trade-offs: indirect editing reduced effort but introduced errors, while direct editing ensured precision but with higher workload. We conclude with design and policy implications for building safe, scalable LLM-assisted clinical communication systems.

Paper Structure

This paper contains 27 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Web-based study interface. (Left) Condition 1: Writing from Scratch (Write), where doctors write answers in text box B. (Middle) Condition 2: Direct Editing (Edit), where doctors edit pre-generated LLM answers (in the textbox D). (Right) Condition 3: Indirect Editing (Instruct), where doctors provide instructions for revision (I), with changes visually highlighted (F).
  • Figure 2: Phases of our mixed-methods evaluation study.