Table of Contents
Fetching ...

BT-ACTION: A Test-Driven Approach for Modular Understanding of User Instruction Leveraging Behaviour Trees and LLMs

Alexander Leszczynski, Sarah Gillet, Iolanda Leite, Fethiye Irmak Dogan

TL;DR

BT-ACTION addresses the challenge of translating abstract user instructions into robotic actions by combining Behavior Trees with Large Language Models in a test-driven, modular framework. It introduces four BT-based components—Classification, Information, Sequence generation, and Error Handling—that enable disambiguation, capability communication, and safe plan expansion in kitchen tasks. In a kitchen-assistant scenario, the approach reduced task errors and discomfort, increased capacity trust, and was preferred by participants over direct prompting, demonstrated in a 41-participant analysis and an 18-example case dataset. The work advances practical, explainable instruction understanding for real-world HRI and provides publicly available code to enable adoption.

Abstract

Natural language instructions are often abstract and complex, requiring robots to execute multiple subtasks even for seemingly simple queries. For example, when a user asks a robot to prepare avocado toast, the task involves several sequential steps. Moreover, such instructions can be ambiguous or infeasible for the robot or may exceed the robot's existing knowledge. While Large Language Models (LLMs) offer strong language reasoning capabilities to handle these challenges, effectively integrating them into robotic systems remains a key challenge. To address this, we propose BT-ACTION, a test-driven approach that combines the modular structure of Behavior Trees (BT) with LLMs to generate coherent sequences of robot actions for following complex user instructions, specifically in the context of preparing recipes in a kitchen-assistance setting. We evaluated BT-ACTION in a comprehensive user study with 45 participants, comparing its performance to direct LLM prompting. Results demonstrate that the modular design of BT-ACTION helped the robot make fewer mistakes and increased user trust, and participants showed a significant preference for the robot leveraging BT-ACTION. The code is publicly available at https://github.com/1Eggbert7/BT_LLM.

BT-ACTION: A Test-Driven Approach for Modular Understanding of User Instruction Leveraging Behaviour Trees and LLMs

TL;DR

BT-ACTION addresses the challenge of translating abstract user instructions into robotic actions by combining Behavior Trees with Large Language Models in a test-driven, modular framework. It introduces four BT-based components—Classification, Information, Sequence generation, and Error Handling—that enable disambiguation, capability communication, and safe plan expansion in kitchen tasks. In a kitchen-assistant scenario, the approach reduced task errors and discomfort, increased capacity trust, and was preferred by participants over direct prompting, demonstrated in a 41-participant analysis and an 18-example case dataset. The work advances practical, explainable instruction understanding for real-world HRI and provides publicly available code to enable adoption.

Abstract

Natural language instructions are often abstract and complex, requiring robots to execute multiple subtasks even for seemingly simple queries. For example, when a user asks a robot to prepare avocado toast, the task involves several sequential steps. Moreover, such instructions can be ambiguous or infeasible for the robot or may exceed the robot's existing knowledge. While Large Language Models (LLMs) offer strong language reasoning capabilities to handle these challenges, effectively integrating them into robotic systems remains a key challenge. To address this, we propose BT-ACTION, a test-driven approach that combines the modular structure of Behavior Trees (BT) with LLMs to generate coherent sequences of robot actions for following complex user instructions, specifically in the context of preparing recipes in a kitchen-assistance setting. We evaluated BT-ACTION in a comprehensive user study with 45 participants, comparing its performance to direct LLM prompting. Results demonstrate that the modular design of BT-ACTION helped the robot make fewer mistakes and increased user trust, and participants showed a significant preference for the robot leveraging BT-ACTION. The code is publicly available at https://github.com/1Eggbert7/BT_LLM.

Paper Structure

This paper contains 23 sections, 3 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: An example set of actions for task $T_i$ (preparing bacon and egg sandwich), each composed of step-by-step high-level robot actions, $T_i = \{a_0,...,a_8\}$, $a_j \in A$
  • Figure 2: Simplified Behavior Tree of the BT-ACTION System.
  • Figure 3: Sketch of the Experiment setup.
  • Figure 5: Mean and standard error of the number of mistakes for the two conditions.
  • Figure 6: Mean and standard error for MDMT Capacity Trust for the two conditions.