Digitally Supported Analysis of Spontaneous Speech (DigiSpon): Benchmarking NLP-Supported Language Sample Analysis of Swiss Children's Speech
Anja Ryser, Yingqiang Gao, Sarah Ebling
TL;DR
This study addresses the labor-intensive nature of language sample analysis (LSA) for diagnosing developmental language disorder (DLD) by proposing a non-LLM natural language processing (NLP) pipeline that preserves data privacy. Using data from 119 Swiss children and clinicians, the authors implement manual transcription and locally deployed ASR, plus POS tagging and morphological analysis, to build semi-automatic DLD feature profiles in Swiss German and Swiss Standard German. They assess zero-shot capabilities of ASR and POS tagging without commercial LLMs, reporting high inter-annotator agreement and reasonable automatic tagging performance, with normalization improving transcription quality. The work demonstrates the feasibility and ethical viability of integrating locally deployed NLP tools into LSA workflows and outlines a clear path for expanding datasets, dialect-aware models, and deeper syntactic analyses for semi-automatic DLD diagnosis in Switzerland.
Abstract
Language sample analysis (LSA) is a process that complements standardized psychometric tests for diagnosing, for example, developmental language disorder (DLD) in children. However, its labor-intensive nature has limited its use in speech-language pathology practice. We introduce an approach that leverages natural language processing (NLP) methods not based on commercial large language models (LLMs) applied to transcribed speech data from 119 children in the German speaking part of Switzerland with typical and atypical language development. The study aims to identify optimal practices that support speech-language pathologists in diagnosing DLD more efficiently within a human-in-the-loop framework, without relying on potentially unethical implementations that leverage commercial LLMs. Preliminary findings underscore the potential of integrating locally deployed NLP methods into the process of semi-automatic LSA.
