Human-mediated Large Language Models for Robotic Intervention in Children with Autism Spectrum Disorders
Ruchik Mishra, Karla Conn Welch, Dan O Popa
TL;DR
This work tackles the scalability challenge of robotic ASD interventions by introducing a semi-autonomous pipeline in which a robot uses large language models to generate perspective-taking content. The system combines GPT-2 for context generation with either BART or GPT-2 for generating questions and options, while domain experts retain control to determine appropriateness and drive the session. Empirical results show that the GPT-2+BART pipeline achieves higher content-generation quality (via BERTScore) than a GPT-2–only setup, and expert evaluations (NASA TLX, Godspeed, and Appropriateness) indicate the approach does not increase workload and is perceived as safe, likable, and reliable. The findings suggest a feasible path toward scalable, autonomous, or semi-autonomous social-robot interventions for ASD, with opportunities to expand data collection, personalize models, and validate in naturalistic clinical settings.
Abstract
The robotic intervention for individuals with Autism Spectrum Disorder (ASD) has generally used pre-defined scripts to deliver verbal content during one-to-one therapy sessions. This practice restricts the use of robots to limited, pre-mediated instructional curricula. In this paper, we increase robot autonomy in one such robotic intervention for children with ASD by implementing perspective-taking teaching. Our approach uses large language models (LLM) to generate verbal content as texts and then deliver it to the child via robotic speech. In the proposed pipeline, we teach perspective-taking through which our robot takes up three roles: initiator, prompter, and reinforcer. We adopted the GPT-2 + BART pipelines to generate social situations, ask questions (as initiator), and give options (as prompter) when required. The robot encourages the child by giving positive reinforcement for correct answers (as a reinforcer). In addition to our technical contribution, we conducted ten-minute sessions with domain experts simulating an actual perspective teaching session, with the researcher acting as a child participant. These sessions validated our robotic intervention pipeline through surveys, including those from NASA TLX and GodSpeed. We used BERTScore to compare our GPT-2 + BART pipeline with an all GPT-2 and found the performance of the former to be better. Based on the responses by the domain experts, the robot session demonstrated higher performance with no additional increase in mental or physical demand, temporal demand, effort, or frustration compared to a no-robot session. We also concluded that the domain experts perceived the robot as ideally safe, likable, and reliable.
