Evaluation scheme for children-centered language interaction competence of AI-driven robots
Siqi Xie, Jiantao Li
TL;DR
The paper addresses the need for a child-centered evaluation scheme for AI-driven robots' language interaction, arguing that adult benchmarks are insufficient for children. It develops a mixed-methods framework derived from a preliminary thematic analysis of 11 families and 411 shopping-feedback codes, then validates it through an empirical study with six children (ages 3–6) and their parents using the Alpha Egg GPT robot. The resulting five-dimension evaluation scheme (Interactivity, Specificity, Expansibility, Safety, Sociability) comprises 16 indicators and demonstrates acceptable reliability (Cronbach's alpha) while revealing practical findings—simpler language, active parental involvement, and improved multi-speaker voice recognition boost performance. The work contributes a reliability-backed, child-centered framework for researchers and developers to assess and improve child–robot language interactions in real-world settings.
Abstract
This article explores the evaluation method for the language communication proficiency of AI-driven robots engaging in interactive communication with children. The utilization of AI-driven robots in children's everyday communication is swiftly advancing, underscoring the importance of evaluating these robots'language communication skills. Based on 11 Chinese families' interviews and thematic analysis of the comment text from shopping websites, a framework is introduced in the article to assess five key dimensions of child-robot language communication: interactivity, specificity, development, sociality, and safety. We draw on the concept of "children's agency", viewing children as active participants in shaping society and cultural life alongside adults. Therefore, this article places particular emphasis on collecting data related to children. Whether through survey interviews or direct interactive experiments, we treat children as an independent object for data collection. The study involved empirical research following the mentioned framework, which involved capturing interaction videos in natural conversation settings among children from 6 families. Analysis was performed on quantitative data obtained from video recordings, alongside questionnaires and interviews carried out by parents acting as participants or observers. We found that the presence or absence of parents during children's interactions with robots can impact the evaluation of robots'language communication abilities. Ultimately, this article proposes an enhanced comprehensive evaluation framework incorporating insights from parents and children, supported by empirical evidence and inter-rater consistency assessments, showcasing the scheme's efficacy.
