Plan-and-Act using Large Language Models for Interactive Agreement
Kazuhiro Sasabuchi, Naoki Wake, Atsushi Kanehira, Jun Takamatsu, Katsushi Ikeuchi
TL;DR
The paper addresses runtime action planning in human-robot interaction by leveraging large language models to generate plans on the fly. It introduces a plan-and-act skill that combines a bottom-up action set, an event-driven timing manager, and action-text inputs to the LLM, enabling balanced behavior between respecting human activity and pursuing robot goals. The Engage skill demonstrates the approach across four scenarios, achieving about 90% success, with second-stage timing and action-text guidance proving critical for consistency and responsiveness. This work suggests a scalable, generalizable framework for LLM-assisted runtime planning in HRI, reducing manual heuristics while raising questions about reliance versus guidance and future trajectory-level integrations.
Abstract
Recent large language models (LLMs) are capable of planning robot actions. In this paper, we explore how LLMs can be used for planning actions with tasks involving situational human-robot interaction (HRI). A key problem of applying LLMs in situational HRI is balancing between "respecting the current human's activity" and "prioritizing the robot's task," as well as understanding the timing of when to use the LLM to generate an action plan. In this paper, we propose a necessary plan-and-act skill design to solve the above problems. We show that a critical factor for enabling a robot to switch between passive / active interaction behavior is to provide the LLM with an action text about the current robot's action. We also show that a second-stage question to the LLM (about the next timing to call the LLM) is necessary for planning actions at an appropriate timing. The skill design is applied to an Engage skill and is tested on four distinct interaction scenarios. We show that by using the skill design, LLMs can be leveraged to easily scale to different HRI scenarios with a reasonable success rate reaching 90% on the test scenarios.
