Table of Contents
Fetching ...

Assessment of AI-Generated Pediatric Rehabilitation SOAP-Note Quality

Solomon Amenyo, Maura R. Grossman, Daniel G. Brown, Brendan Wylie-Toal

TL;DR

This paper investigates AI-generated SOAP notes in pediatric rehabilitation by comparing Copilot and KAUWbot to human-authored notes using blind clinician evaluations of 432 notes. It employs a PDQI-9–inspired five-criterion rubric and a four-clinician evaluation to assess note quality, with notes anonymized and randomized to prevent bias. The findings show AI-generated notes achieve quality comparable to human-authored notes, and accuracy improves when AI drafts are edited by clinicians, with KAUWbot-edited notes performing best. The study supports a human-in-the-loop approach that can reduce documentation burden while maintaining high-quality clinical documentation, informing practical AI integration in pediatric rehabilitation and similar settings.

Abstract

This study explores the integration of artificial intelligence (AI) or large language models (LLMs) into pediatric rehabilitation clinical documentation, focusing on the generation of SOAP (Subjective, Objective, Assessment, Plan) notes, which are essential for patient care. Creating complex documentation is time-consuming in pediatric settings. We evaluate the effectiveness of two AI tools; Copilot, a commercial LLM, and KAUWbot, a fine-tuned LLM developed for KidsAbility Centre for Child Development (an Ontario pediatric rehabilitation facility), in simplifying and automating this process. We focus on two key questions: (i) How does the quality of AI-generated SOAP notes based on short clinician summaries compare to human-authored notes, and (ii) To what extent is human editing necessary for improving AI-generated SOAP notes? We found no evidence of prior work assessing the quality of AI-generated clinical notes in pediatric rehabilitation. We used a sample of 432 SOAP notes, evenly divided among human-authored, Copilot-generated, and KAUWbot-generated notes. We employ a blind evaluation by experienced clinicians based on a custom rubric. Statistical analysis is conducted to assess the quality of the notes and the impact of human editing. The results suggest that AI tools such as KAUWbot and Copilot can generate SOAP notes with quality comparable to those authored by humans. We highlight the potential for combining AI with human expertise to enhance clinical documentation and offer insights for the future integration of AI into pediatric rehabilitation practice and other settings for the management of clinical conditions.

Assessment of AI-Generated Pediatric Rehabilitation SOAP-Note Quality

TL;DR

This paper investigates AI-generated SOAP notes in pediatric rehabilitation by comparing Copilot and KAUWbot to human-authored notes using blind clinician evaluations of 432 notes. It employs a PDQI-9–inspired five-criterion rubric and a four-clinician evaluation to assess note quality, with notes anonymized and randomized to prevent bias. The findings show AI-generated notes achieve quality comparable to human-authored notes, and accuracy improves when AI drafts are edited by clinicians, with KAUWbot-edited notes performing best. The study supports a human-in-the-loop approach that can reduce documentation burden while maintaining high-quality clinical documentation, informing practical AI integration in pediatric rehabilitation and similar settings.

Abstract

This study explores the integration of artificial intelligence (AI) or large language models (LLMs) into pediatric rehabilitation clinical documentation, focusing on the generation of SOAP (Subjective, Objective, Assessment, Plan) notes, which are essential for patient care. Creating complex documentation is time-consuming in pediatric settings. We evaluate the effectiveness of two AI tools; Copilot, a commercial LLM, and KAUWbot, a fine-tuned LLM developed for KidsAbility Centre for Child Development (an Ontario pediatric rehabilitation facility), in simplifying and automating this process. We focus on two key questions: (i) How does the quality of AI-generated SOAP notes based on short clinician summaries compare to human-authored notes, and (ii) To what extent is human editing necessary for improving AI-generated SOAP notes? We found no evidence of prior work assessing the quality of AI-generated clinical notes in pediatric rehabilitation. We used a sample of 432 SOAP notes, evenly divided among human-authored, Copilot-generated, and KAUWbot-generated notes. We employ a blind evaluation by experienced clinicians based on a custom rubric. Statistical analysis is conducted to assess the quality of the notes and the impact of human editing. The results suggest that AI tools such as KAUWbot and Copilot can generate SOAP notes with quality comparable to those authored by humans. We highlight the potential for combining AI with human expertise to enhance clinical documentation and offer insights for the future integration of AI into pediatric rehabilitation practice and other settings for the management of clinical conditions.

Paper Structure

This paper contains 33 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Sample Prompt for Copilot, Sample Scratch Note, and Generated Copilot SOAP Note.
  • Figure 2: Histogram of Mean Quality Scores for the SOAP Note Pools.
  • Figure :
  • Figure :
  • Figure :
  • ...and 1 more figures