Table of Contents
Fetching ...

STAR: A Foundation Model-driven Framework for Robust Task Planning and Failure Recovery in Robotic Systems

Md Sadman Sakib, Yu Sun

TL;DR

STAR addresses the challenge of robust, autonomous task planning and failure recovery in dynamic robotics by integrating Foundation Models with a dynamically expanding Knowledge Graph (FOON) and a specialized FailNet for failure strategies. The framework alternates between FM-driven reasoning and KG-grounded retrieval, using Lifelong Learning to evolve knowledge and reduce reliance on large models over time. Failure detection uses Vision-Language Models for mid-execution diagnostics, and hierarchical re-planning leverages FailNet to generate context-aware recovery plans that are executed via PDDL and classical planners. Experimental results show 86% task planning accuracy and 78% recovery success, with additional improvements from image-grid frame sampling and progressive failure detection, highlighting STAR’s potential for long-term, multi-domain deployment in real-world settings.

Abstract

Modern robotic systems, deployed across domains from industrial automation to domestic assistance, face a critical challenge: executing tasks with precision and adaptability in dynamic, unpredictable environments. To address this, we propose STAR (Smart Task Adaptation and Recovery), a novel framework that synergizes Foundation Models (FMs) with dynamically expanding Knowledge Graphs (KGs) to enable resilient task planning and autonomous failure recovery. While FMs offer remarkable generalization and contextual reasoning, their limitations, including computational inefficiency, hallucinations, and output inconsistencies hinder reliable deployment. STAR mitigates these issues by embedding learned knowledge into structured, reusable KGs, which streamline information retrieval, reduce redundant FM computations, and provide precise, scenario-specific insights. The framework leverages FM-driven reasoning to diagnose failures, generate context-aware recovery strategies, and execute corrective actions without human intervention or system restarts. Unlike conventional approaches that rely on rigid protocols, STAR dynamically expands its KG with experiential knowledge, ensuring continuous adaptation to novel scenarios. To evaluate the effectiveness of this approach, we developed a comprehensive dataset that includes various robotic tasks and failure scenarios. Through extensive experimentation, STAR demonstrated an 86% task planning accuracy and 78% recovery success rate, showing significant improvements over baseline methods. The framework's ability to continuously learn from experience while maintaining structured knowledge representation makes it particularly suitable for long-term deployment in real-world applications.

STAR: A Foundation Model-driven Framework for Robust Task Planning and Failure Recovery in Robotic Systems

TL;DR

STAR addresses the challenge of robust, autonomous task planning and failure recovery in dynamic robotics by integrating Foundation Models with a dynamically expanding Knowledge Graph (FOON) and a specialized FailNet for failure strategies. The framework alternates between FM-driven reasoning and KG-grounded retrieval, using Lifelong Learning to evolve knowledge and reduce reliance on large models over time. Failure detection uses Vision-Language Models for mid-execution diagnostics, and hierarchical re-planning leverages FailNet to generate context-aware recovery plans that are executed via PDDL and classical planners. Experimental results show 86% task planning accuracy and 78% recovery success, with additional improvements from image-grid frame sampling and progressive failure detection, highlighting STAR’s potential for long-term, multi-domain deployment in real-world settings.

Abstract

Modern robotic systems, deployed across domains from industrial automation to domestic assistance, face a critical challenge: executing tasks with precision and adaptability in dynamic, unpredictable environments. To address this, we propose STAR (Smart Task Adaptation and Recovery), a novel framework that synergizes Foundation Models (FMs) with dynamically expanding Knowledge Graphs (KGs) to enable resilient task planning and autonomous failure recovery. While FMs offer remarkable generalization and contextual reasoning, their limitations, including computational inefficiency, hallucinations, and output inconsistencies hinder reliable deployment. STAR mitigates these issues by embedding learned knowledge into structured, reusable KGs, which streamline information retrieval, reduce redundant FM computations, and provide precise, scenario-specific insights. The framework leverages FM-driven reasoning to diagnose failures, generate context-aware recovery strategies, and execute corrective actions without human intervention or system restarts. Unlike conventional approaches that rely on rigid protocols, STAR dynamically expands its KG with experiential knowledge, ensuring continuous adaptation to novel scenarios. To evaluate the effectiveness of this approach, we developed a comprehensive dataset that includes various robotic tasks and failure scenarios. Through extensive experimentation, STAR demonstrated an 86% task planning accuracy and 78% recovery success rate, showing significant improvements over baseline methods. The framework's ability to continuously learn from experience while maintaining structured knowledge representation makes it particularly suitable for long-term deployment in real-world applications.

Paper Structure

This paper contains 26 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Overview of our task planning and failure recovery system.
  • Figure 2: Task planning pipeline in STAR. Given a natural language input, the system searches for matching tasks in the Knowledge Graph (KG). If no match exists, LLM generates a new task tree, which is verified, converted to PDDL format, and stored in the KG for future use.
  • Figure 3: An example of task planning using STAR. For better readability, the functional units in the task tree are translated to natural language sentences.
  • Figure 4: Failure detection pipeline.
  • Figure 6: Snapshot from FailNet
  • ...and 2 more figures