Exploring GPT-4 for Robotic Agent Strategy with Real-Time State Feedback and a Reactive Behaviour Framework
Thomas O'Brien, Ysobel Sims
TL;DR
The paper addresses enabling a humanoid robot to interpret high-level user goals via an LLM and execute via a reactive, tree-based behavior framework. It integrates GPT-4 as a LLM Provider within the Director framework and NUClear real-time messaging to produce rolling task plans with safety guarantees. Experiments in Webots simulation and on a NUgus platform show that the approach achieves a majority of goals with smooth transitions, while highlighting issues in perception feedback and the cost of online LLM deployment. The work suggests that combining reactive behavior trees with LLMs offers a practical path toward adaptable, safety-conscious robotic agents, with future work focusing on broader tasks, improved localization, and local LLM deployment.
Abstract
We explore the use of GPT-4 on a humanoid robot in simulation and the real world as proof of concept of a novel large language model (LLM) driven behaviour method. LLMs have shown the ability to perform various tasks, including robotic agent behaviour. The problem involves prompting the LLM with a goal, and the LLM outputs the sub-tasks to complete to achieve that goal. Previous works focus on the executability and correctness of the LLM's generated tasks. We propose a method that successfully addresses practical concerns around safety, transitions between tasks, time horizons of tasks and state feedback. In our experiments we have found that our approach produces output for feasible requests that can be executed every time, with smooth transitions. User requests are achieved most of the time across a range of goal time horizons.
