Table of Contents
Fetching ...

Beyond Length: Context-Aware Expansion and Independence as Developmentally Sensitive Evaluation in Child Utterances

Jiyun Chun, Eric Fosler-Lussier, Michael White, Andrew Perrault

TL;DR

This work tackles the challenge of evaluating child utterances in adult–child dialogue by moving beyond length-based proxies to context-aware metrics. It introduces two developmentally informed axes, Expansion ($E$) and Independence ($I$), and uses an LLM-as-a-judge conditioned on Previous Adult Utterance Type ($PT$) to score utterances for contextual elaboration and discourse advancement. Through CHILDES data and careful length-disentangling modeling, the authors show that $E$ and $I$ capture distinct developmental signals, improve age prediction over baselines, and exhibit semantic sensitivity to discourse markers, with substantial alignment to human judgments. The framework offers practical benefits for educational tutoring, dialogue auditing, and AI safety, enabling scalable, interpretable assessment of child discourse that emphasizes meaningful contribution over sheer length, while acknowledging limitations related to transcription modality and potential pretraining contamination of LLM baselines. The approach paves the way for robust, context-sensitive evaluation of child language development across modalities and languages.

Abstract

Evaluating the quality of children's utterances in adult-child dialogue remains challenging due to insufficient context-sensitive metrics. Common proxies such as Mean Length of Utterance (MLU), lexical diversity (vocd-D), and readability indices (Flesch-Kincaid Grade Level, Gunning Fog Index) are dominated by length and ignore conversational context, missing aspects of response quality such as reasoning depth, topic maintenance, and discourse planning. We introduce an LLM-as-a-judge framework that first classifies the Previous Adult Utterance Type and then scores the child's response along two axes: Expansion (contextual elaboration and inferential depth) and Independence (the child's contribution to advancing the discourse). These axes reflect fundamental dimensions in child language development, where Expansion captures elaboration, clause combining, and causal and contrastive connectives. Independence captures initiative, topic control, decreasing reliance on adult scaffolding through growing self-regulation, and audience design. We establish developmental validity by showing age-related patterns and demonstrate predictive value by improving age estimation over common baselines. We further confirm semantic sensitivity by detecting differences tied to discourse relations. Our metrics align with human judgments, enabling large-scale evaluation. This shifts child utterance assessment from simply measuring length to evaluating how meaningfully the child's speech contributes to and advances the conversation within its context.

Beyond Length: Context-Aware Expansion and Independence as Developmentally Sensitive Evaluation in Child Utterances

TL;DR

This work tackles the challenge of evaluating child utterances in adult–child dialogue by moving beyond length-based proxies to context-aware metrics. It introduces two developmentally informed axes, Expansion () and Independence (), and uses an LLM-as-a-judge conditioned on Previous Adult Utterance Type () to score utterances for contextual elaboration and discourse advancement. Through CHILDES data and careful length-disentangling modeling, the authors show that and capture distinct developmental signals, improve age prediction over baselines, and exhibit semantic sensitivity to discourse markers, with substantial alignment to human judgments. The framework offers practical benefits for educational tutoring, dialogue auditing, and AI safety, enabling scalable, interpretable assessment of child discourse that emphasizes meaningful contribution over sheer length, while acknowledging limitations related to transcription modality and potential pretraining contamination of LLM baselines. The approach paves the way for robust, context-sensitive evaluation of child language development across modalities and languages.

Abstract

Evaluating the quality of children's utterances in adult-child dialogue remains challenging due to insufficient context-sensitive metrics. Common proxies such as Mean Length of Utterance (MLU), lexical diversity (vocd-D), and readability indices (Flesch-Kincaid Grade Level, Gunning Fog Index) are dominated by length and ignore conversational context, missing aspects of response quality such as reasoning depth, topic maintenance, and discourse planning. We introduce an LLM-as-a-judge framework that first classifies the Previous Adult Utterance Type and then scores the child's response along two axes: Expansion (contextual elaboration and inferential depth) and Independence (the child's contribution to advancing the discourse). These axes reflect fundamental dimensions in child language development, where Expansion captures elaboration, clause combining, and causal and contrastive connectives. Independence captures initiative, topic control, decreasing reliance on adult scaffolding through growing self-regulation, and audience design. We establish developmental validity by showing age-related patterns and demonstrate predictive value by improving age estimation over common baselines. We further confirm semantic sensitivity by detecting differences tied to discourse relations. Our metrics align with human judgments, enabling large-scale evaluation. This shifts child utterance assessment from simply measuring length to evaluating how meaningfully the child's speech contributes to and advances the conversation within its context.
Paper Structure (39 sections, 4 equations, 15 tables)