Table of Contents
Fetching ...

ConceptAgent: LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Execution

Corban Rivera, Grayson Byrd, William Paul, Tyler Feldman, Meghan Booker, Emma Holmes, David Handelman, Bethany Kemp, Andrew Badger, Aurora Schmidt, Krishna Murthy Jatavallabhula, Celso M de Melo, Lalithkumar Seenivasan, Mathias Unberath, Rama Chellappa

TL;DR

ConceptAgent is introduced, a natural language-driven robotic platform designed for task execution in unstructured environments with a focus on scalability and reliability of LLM-based planning in complex state and action spaces, and innovations designed to limit these shortcomings are presented.

Abstract

Robotic planning and execution in open-world environments is a complex problem due to the vast state spaces and high variability of task embodiment. Recent advances in perception algorithms, combined with Large Language Models (LLMs) for planning, offer promising solutions to these challenges, as the common sense reasoning capabilities of LLMs provide a strong heuristic for efficiently searching the action space. However, prior work fails to address the possibility of hallucinations from LLMs, which results in failures to execute the planned actions largely due to logical fallacies at high- or low-levels. To contend with automation failure due to such hallucinations, we introduce ConceptAgent, a natural language-driven robotic platform designed for task execution in unstructured environments. With a focus on scalability and reliability of LLM-based planning in complex state and action spaces, we present innovations designed to limit these shortcomings, including 1) Predicate Grounding to prevent and recover from infeasible actions, and 2) an embodied version of LLM-guided Monte Carlo Tree Search with self reflection. In simulation experiments, ConceptAgent achieved a 19% task completion rate across three room layouts and 30 easy level embodied tasks outperforming other state-of-the-art LLM-driven reasoning baselines that scored 10.26% and 8.11% on the same benchmark. Additionally, ablation studies on moderate to hard embodied tasks revealed a 20% increase in task completion from the baseline agent to the fully enhanced ConceptAgent, highlighting the individual and combined contributions of Predicate Grounding and LLM-guided Tree Search to enable more robust automation in complex state and action spaces.

ConceptAgent: LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Execution

TL;DR

ConceptAgent is introduced, a natural language-driven robotic platform designed for task execution in unstructured environments with a focus on scalability and reliability of LLM-based planning in complex state and action spaces, and innovations designed to limit these shortcomings are presented.

Abstract

Robotic planning and execution in open-world environments is a complex problem due to the vast state spaces and high variability of task embodiment. Recent advances in perception algorithms, combined with Large Language Models (LLMs) for planning, offer promising solutions to these challenges, as the common sense reasoning capabilities of LLMs provide a strong heuristic for efficiently searching the action space. However, prior work fails to address the possibility of hallucinations from LLMs, which results in failures to execute the planned actions largely due to logical fallacies at high- or low-levels. To contend with automation failure due to such hallucinations, we introduce ConceptAgent, a natural language-driven robotic platform designed for task execution in unstructured environments. With a focus on scalability and reliability of LLM-based planning in complex state and action spaces, we present innovations designed to limit these shortcomings, including 1) Predicate Grounding to prevent and recover from infeasible actions, and 2) an embodied version of LLM-guided Monte Carlo Tree Search with self reflection. In simulation experiments, ConceptAgent achieved a 19% task completion rate across three room layouts and 30 easy level embodied tasks outperforming other state-of-the-art LLM-driven reasoning baselines that scored 10.26% and 8.11% on the same benchmark. Additionally, ablation studies on moderate to hard embodied tasks revealed a 20% increase in task completion from the baseline agent to the fully enhanced ConceptAgent, highlighting the individual and combined contributions of Predicate Grounding and LLM-guided Tree Search to enable more robust automation in complex state and action spaces.
Paper Structure (31 sections, 4 equations, 5 figures, 4 tables)

This paper contains 31 sections, 4 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: ConceptAgent enables robust real-time, natural language-driven task execution in open-world environments. The problem requires the robot to operate in unfamiliar settings and manipulate novel objects to complete tasks described in unconstrained natural language. In this escape room motivated example, the task given is to "unlock the door". The robot must not only identify objects but (a) understand the context of the scene including (b) a hand written note on the door with additional instructions. (c) The ConceptAgent-driven Spot robot then proceeds to complete the task successfully without intervention.
  • Figure 2: Overview of ConceptAgent closed loop task planning and execution. State is composed of text description of the objective, task relevant observations, and task history. That is combined with the details of a parametric skills library. Tree-based planning is complemented with (b) LLM-based expansion and (c) LLM based critique and scoring. (a) Selection and (d) backpropagation are conducted like Monte-Carlo Tree Search.
  • Figure 3: Evaluation of Physical Mobile Manipulation for Open Vocabulary Object Rearrangement - Object rearrangement success and failure cases aggregated over all levels of clutter, broken down by mode of failure / success. From left to right, we show the performance of the system to locate the object for object rearrangement, navigate to it, perceive it, grasp it, locate the destination, navigate to the destination, and place the object into the receptacle.
  • Figure 4: Mobile Manipulation Trials - The trials were aimed at categorizing failure modes for physical mobile manipulation.
  • Figure 5: Example Task Completion in AI2Thor by ConceptAgentTask: Put the credit card on the counter into the kitchen drawer. (a) the agent starts in the kitchen, (b) after some exploration the agent finds the credit card, (c) the agent takes the credit card to the kitchen drawer, but it's closed, (d) the agent adapts by placing the card by the kitchen sink, before (e) moving back to the drawer and opening it. (f) the agent moves back to the sink to pick up the card again, before (g) moving back to the open drawer, and (h) placing the credit card in the drawer to complete the task.