Table of Contents
Fetching ...

VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive Robots

Akhil Padmanabha, Jessie Yuan, Janavi Gupta, Zulekha Karachiwalla, Carmel Majidi, Henny Admoni, Zackory Erickson

TL;DR

This work tackles the design of LLM-based speech interfaces for physically assistive robots to support independence in people with motor impairments. It proposes an iterative three-version framework that integrates an off-the-shelf LLM with the Obi robot for feeding tasks and validates the approach through a user study with 11 older adults, complemented by qualitative and quantitative analyses. The key contributions include a final nine-component framework, five user-centered design guidelines (Customization, Multi-Step Instruction, Consistency, Comparable Time to Caregiver, Social Capability), and practical insights to guide researchers and designers in deploying LLMs for assistive robotics. The study demonstrates the potential of combining prompt/system engineering with human-centered evaluation to create usable, safe, and adaptable speech interfaces for robot-assisted care, with implications for independence and quality of life.

Abstract

Physically assistive robots present an opportunity to significantly increase the well-being and independence of individuals with motor impairments or other forms of disability who are unable to complete activities of daily living. Speech interfaces, especially ones that utilize Large Language Models (LLMs), can enable individuals to effectively and naturally communicate high-level commands and nuanced preferences to robots. Frameworks for integrating LLMs as interfaces to robots for high level task planning and code generation have been proposed, but fail to incorporate human-centric considerations which are essential while developing assistive interfaces. In this work, we present a framework for incorporating LLMs as speech interfaces for physically assistive robots, constructed iteratively with 3 stages of testing involving a feeding robot, culminating in an evaluation with 11 older adults at an independent living facility. We use both quantitative and qualitative data from the final study to validate our framework and additionally provide design guidelines for using LLMs as speech interfaces for assistive robots. Videos and supporting files are located on our project website: https://sites.google.com/andrew.cmu.edu/voicepilot/

VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive Robots

TL;DR

This work tackles the design of LLM-based speech interfaces for physically assistive robots to support independence in people with motor impairments. It proposes an iterative three-version framework that integrates an off-the-shelf LLM with the Obi robot for feeding tasks and validates the approach through a user study with 11 older adults, complemented by qualitative and quantitative analyses. The key contributions include a final nine-component framework, five user-centered design guidelines (Customization, Multi-Step Instruction, Consistency, Comparable Time to Caregiver, Social Capability), and practical insights to guide researchers and designers in deploying LLMs for assistive robotics. The study demonstrates the potential of combining prompt/system engineering with human-centered evaluation to create usable, safe, and adaptable speech interfaces for robot-assisted care, with implications for independence and quality of life.

Abstract

Physically assistive robots present an opportunity to significantly increase the well-being and independence of individuals with motor impairments or other forms of disability who are unable to complete activities of daily living. Speech interfaces, especially ones that utilize Large Language Models (LLMs), can enable individuals to effectively and naturally communicate high-level commands and nuanced preferences to robots. Frameworks for integrating LLMs as interfaces to robots for high level task planning and code generation have been proposed, but fail to incorporate human-centric considerations which are essential while developing assistive interfaces. In this work, we present a framework for incorporating LLMs as speech interfaces for physically assistive robots, constructed iteratively with 3 stages of testing involving a feeding robot, culminating in an evaluation with 11 older adults at an independent living facility. We use both quantitative and qualitative data from the final study to validate our framework and additionally provide design guidelines for using LLMs as speech interfaces for assistive robots. Videos and supporting files are located on our project website: https://sites.google.com/andrew.cmu.edu/voicepilot/
Paper Structure (48 sections, 9 figures, 1 table)

This paper contains 48 sections, 9 figures, 1 table.

Figures (9)

  • Figure 1: Obi robot and setup for study with 11 older adults at an independent living facility. The Obi robot's arm moves towards a participant's mouth with a spoonful of granola. A microphone is positioned to the right of the participant and a cheat sheet with example commands is placed to the left of the participant.
  • Figure 2: Our final framework consisting of 9 components is shown. The color and annotation at the top left of each numbered component indicates whether the component is related to prompt engineering (PE), system rollout (SR), or both prompt engineering and system rollout (PS). System rollout refers to the process of deploying the LLM-based speech interface and robot with users. The annotation at the bottom right of each numbered component shows which framework iteration it was added in.
  • Figure 3: 9 of the 11 participants are shown at various stages of the Version 3 study at an independent living facility with older adults.
  • Figure 4: Participant self-reported success per task for each attempt. After both the practice task and the 5 predefined tasks, participants reply Yes/No to the question "Did the robot adequately complete the intended task?"
  • Figure 5: Responses from six 7-point Likert Items answered by the 11 participants at the end of the study. A high score is best for all items, with 1 = Strongly Disagree and 7 = Strongly Agree.
  • ...and 4 more figures