Retrieval-Augmented Generation of Pediatric Speech-Language Pathology vignettes: A Proof-of-Concept Study
Yilan Liu
TL;DR
This paper presents a proof-of-concept system that uses retrieval-augmented generation (RAG) to ground pediatric speech-language pathology (SLP) vignette generation in curated domain knowledge. By integrating a structured knowledge base with engineered prompt templates, the authors evaluate multi-model LLM backends (commercial and open-source) and demonstrate 100% generation success across 35 test cases spanning 11 disorders and 6 categories. Automated quality metrics show modest advantages for commercial models, while open-source models achieve acceptable performance, highlighting potential for privacy-preserving institutional deployment. The work establishes a foundation for scalable, standards-aligned SLP educational content and outlines extensive validation requirements, including expert reviews and psychometric studies, before educational or clinical adoption. Practical implications encompass future uses in clinical decision support, automated IEP goal generation, and enhanced clinical reflection training, with broader relevance to domain-grounded AI in health professions education.
Abstract
Clinical vignettes are essential educational tools in speech-language pathology (SLP), but manual creation is time-intensive. While general-purpose large language models (LLMs) can generate text, they lack domain-specific knowledge, leading to hallucinations and requiring extensive expert revision. This study presents a proof-of-concept system integrating retrieval-augmented generation (RAG) with curated knowledge bases to generate pediatric SLP case materials. A multi-model RAG-based system was prototyped integrating curated domain knowledge with engineered prompt templates, supporting five commercial (GPT-4o, Claude 3.5 Sonnet, Gemini 2.5 Pro) and open-source (Llama 3.2, Qwen 2.5-7B) LLMs. Seven test scenarios spanning diverse disorder types and grade levels were systematically designed. Generated cases underwent automated quality assessment using a multi-dimensional rubric evaluating structural completeness, internal consistency, clinical appropriateness, and IEP goal/session note quality. This proof-of-concept demonstrates technical feasibility for RAG-augmented generation of pediatric SLP vignettes. Commercial models showed marginal quality advantages, but open-source alternatives achieved acceptable performance, suggesting potential for privacy-preserving institutional deployment. Integration of curated knowledge bases enabled content generation aligned with professional guidelines. Extensive validation through expert review, student pilot testing, and psychometric evaluation is required before educational or research implementation. Future applications may extend to clinical decision support, automated IEP goal generation, and clinical reflection training.
