Table of Contents
Fetching ...

RAIDER: Tool-Equipped Large Language Model Agent for Robotic Action Issue Detection, Explanation and Recovery

Silvia Izquierdo-Badiola, Carlos Rizzo, Guillem Alenyà

TL;DR

RAIDER tackles the challenge of grounding and recovering from action-related issues in embodied robots operating around humans. It integrates an LLM with a grounded toolset under a Ground, Ask&Answer, Issue procedure to selectively gather context and identify precondition ambiguities or infeasibilities, then generate recoveries often involving human input. In simulated AI2THOR household tasks, RAIDER outperforms baselines in grounding, issue detection, and explanation, and its explanations substantially improve interactive recovery planning. The framework is modular and extensible, enabling adaptation to real-world assistive scenarios with minimal reconfiguration.

Abstract

As robots increasingly operate in dynamic human-centric environments, improving their ability to detect, explain, and recover from action-related issues becomes crucial. Traditional model-based and data-driven techniques lack adaptability, while more flexible generative AI methods struggle with grounding extracted information to real-world constraints. We introduce RAIDER, a novel agent that integrates Large Language Models (LLMs) with grounded tools for adaptable and efficient issue detection and explanation. Using a unique "Ground, Ask&Answer, Issue" procedure, RAIDER dynamically generates context-aware precondition questions and selects appropriate tools for resolution, achieving targeted information gathering. Our results within a simulated household environment surpass methods relying on predefined models, full scene descriptions, or standalone trained models. Additionally, RAIDER's explanations enhance recovery success, including cases requiring human interaction. Its modular architecture, featuring self-correction mechanisms, enables straightforward adaptation to diverse scenarios, as demonstrated in a real-world human-assistive task. This showcases RAIDER's potential as a versatile agentic AI solution for robotic issue detection and explanation, while addressing the problem of grounding generative AI for its effective application in embodied agents. Project website: https://eurecat.github.io/raider-llmagent/

RAIDER: Tool-Equipped Large Language Model Agent for Robotic Action Issue Detection, Explanation and Recovery

TL;DR

RAIDER tackles the challenge of grounding and recovering from action-related issues in embodied robots operating around humans. It integrates an LLM with a grounded toolset under a Ground, Ask&Answer, Issue procedure to selectively gather context and identify precondition ambiguities or infeasibilities, then generate recoveries often involving human input. In simulated AI2THOR household tasks, RAIDER outperforms baselines in grounding, issue detection, and explanation, and its explanations substantially improve interactive recovery planning. The framework is modular and extensible, enabling adaptation to real-world assistive scenarios with minimal reconfiguration.

Abstract

As robots increasingly operate in dynamic human-centric environments, improving their ability to detect, explain, and recover from action-related issues becomes crucial. Traditional model-based and data-driven techniques lack adaptability, while more flexible generative AI methods struggle with grounding extracted information to real-world constraints. We introduce RAIDER, a novel agent that integrates Large Language Models (LLMs) with grounded tools for adaptable and efficient issue detection and explanation. Using a unique "Ground, Ask&Answer, Issue" procedure, RAIDER dynamically generates context-aware precondition questions and selects appropriate tools for resolution, achieving targeted information gathering. Our results within a simulated household environment surpass methods relying on predefined models, full scene descriptions, or standalone trained models. Additionally, RAIDER's explanations enhance recovery success, including cases requiring human interaction. Its modular architecture, featuring self-correction mechanisms, enables straightforward adaptation to diverse scenarios, as demonstrated in a real-world human-assistive task. This showcases RAIDER's potential as a versatile agentic AI solution for robotic issue detection and explanation, while addressing the problem of grounding generative AI for its effective application in embodied agents. Project website: https://eurecat.github.io/raider-llmagent/

Paper Structure

This paper contains 33 sections, 22 figures, 3 tables.

Figures (22)

  • Figure 1: RAIDER is composed of an LLM guided by a system prompt and a suite of tools, interconnected through a program flow managing their interactions. The system can process structured or unstructured action queries, referring to objects using various levels of abstraction. Following the "Ground, Ask&Answer, Issue" reasoning procedure, RAIDER dynamically generates and resolves precondition questions to determine ambiguity or unfeasibility issues. The system's output is leveraged by an LLM to generate a recovery plan involving human interaction.
  • Figure 2: RAIDER-LLM Agent. Qualitative examples showing an unfeasibility detection (top) and ambiguity detection (bottom) in action execution. Based on the instruction, available objects and tools, the LLM employs "Ground, Ask&Answer, Issue" reasoning procedure detailed in the prompt, interacting with tools via the program flow manager.
  • Figure 3: The Program Flow Manager (PFM) regulates interactions between the LLM and the tools until a final response is produced with no tool calls left, or a timeout is reached. The interactions are verified at critical steps, with warnings appended to the conversation as required.
  • Figure 4: Full prompt template for issue detection in robotic action execution. The gray blocks are general to any issue detection application, while the purple blocks should be tailored to each specific scenario.
  • Figure 5: Prompt - Task Objective. The task objective consists of a list of potential issues to identify, which can be adapted based on the application needs.
  • ...and 17 more figures