Table of Contents
Fetching ...

Human-mediated Large Language Models for Robotic Intervention in Children with Autism Spectrum Disorders

Ruchik Mishra, Karla Conn Welch, Dan O Popa

TL;DR

This work tackles the scalability challenge of robotic ASD interventions by introducing a semi-autonomous pipeline in which a robot uses large language models to generate perspective-taking content. The system combines GPT-2 for context generation with either BART or GPT-2 for generating questions and options, while domain experts retain control to determine appropriateness and drive the session. Empirical results show that the GPT-2+BART pipeline achieves higher content-generation quality (via BERTScore) than a GPT-2–only setup, and expert evaluations (NASA TLX, Godspeed, and Appropriateness) indicate the approach does not increase workload and is perceived as safe, likable, and reliable. The findings suggest a feasible path toward scalable, autonomous, or semi-autonomous social-robot interventions for ASD, with opportunities to expand data collection, personalize models, and validate in naturalistic clinical settings.

Abstract

The robotic intervention for individuals with Autism Spectrum Disorder (ASD) has generally used pre-defined scripts to deliver verbal content during one-to-one therapy sessions. This practice restricts the use of robots to limited, pre-mediated instructional curricula. In this paper, we increase robot autonomy in one such robotic intervention for children with ASD by implementing perspective-taking teaching. Our approach uses large language models (LLM) to generate verbal content as texts and then deliver it to the child via robotic speech. In the proposed pipeline, we teach perspective-taking through which our robot takes up three roles: initiator, prompter, and reinforcer. We adopted the GPT-2 + BART pipelines to generate social situations, ask questions (as initiator), and give options (as prompter) when required. The robot encourages the child by giving positive reinforcement for correct answers (as a reinforcer). In addition to our technical contribution, we conducted ten-minute sessions with domain experts simulating an actual perspective teaching session, with the researcher acting as a child participant. These sessions validated our robotic intervention pipeline through surveys, including those from NASA TLX and GodSpeed. We used BERTScore to compare our GPT-2 + BART pipeline with an all GPT-2 and found the performance of the former to be better. Based on the responses by the domain experts, the robot session demonstrated higher performance with no additional increase in mental or physical demand, temporal demand, effort, or frustration compared to a no-robot session. We also concluded that the domain experts perceived the robot as ideally safe, likable, and reliable.

Human-mediated Large Language Models for Robotic Intervention in Children with Autism Spectrum Disorders

TL;DR

This work tackles the scalability challenge of robotic ASD interventions by introducing a semi-autonomous pipeline in which a robot uses large language models to generate perspective-taking content. The system combines GPT-2 for context generation with either BART or GPT-2 for generating questions and options, while domain experts retain control to determine appropriateness and drive the session. Empirical results show that the GPT-2+BART pipeline achieves higher content-generation quality (via BERTScore) than a GPT-2–only setup, and expert evaluations (NASA TLX, Godspeed, and Appropriateness) indicate the approach does not increase workload and is perceived as safe, likable, and reliable. The findings suggest a feasible path toward scalable, autonomous, or semi-autonomous social-robot interventions for ASD, with opportunities to expand data collection, personalize models, and validate in naturalistic clinical settings.

Abstract

The robotic intervention for individuals with Autism Spectrum Disorder (ASD) has generally used pre-defined scripts to deliver verbal content during one-to-one therapy sessions. This practice restricts the use of robots to limited, pre-mediated instructional curricula. In this paper, we increase robot autonomy in one such robotic intervention for children with ASD by implementing perspective-taking teaching. Our approach uses large language models (LLM) to generate verbal content as texts and then deliver it to the child via robotic speech. In the proposed pipeline, we teach perspective-taking through which our robot takes up three roles: initiator, prompter, and reinforcer. We adopted the GPT-2 + BART pipelines to generate social situations, ask questions (as initiator), and give options (as prompter) when required. The robot encourages the child by giving positive reinforcement for correct answers (as a reinforcer). In addition to our technical contribution, we conducted ten-minute sessions with domain experts simulating an actual perspective teaching session, with the researcher acting as a child participant. These sessions validated our robotic intervention pipeline through surveys, including those from NASA TLX and GodSpeed. We used BERTScore to compare our GPT-2 + BART pipeline with an all GPT-2 and found the performance of the former to be better. Based on the responses by the domain experts, the robot session demonstrated higher performance with no additional increase in mental or physical demand, temporal demand, effort, or frustration compared to a no-robot session. We also concluded that the domain experts perceived the robot as ideally safe, likable, and reliable.
Paper Structure (20 sections, 14 equations, 14 figures, 1 table)

This paper contains 20 sections, 14 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: The setup shows the domain expert (actual) and the child with ASD (actor) in a session with the NAO robot. Here 'Ctx' refers to the context describing a situation, 'Q' is the question based on that context and, A-C are the options consisting of a possible answer.
  • Figure 2: Pipeline of the human-mediated autonomous system.
  • Figure 3: User inputs that the domain experts are asked to give in on the computer based on the questions asked on the GUI.
  • Figure 4: Example of how the actual GUI looks like for user input 3.
  • Figure 5: LLM pipelines used and compared in this paper.
  • ...and 9 more figures