Table of Contents
Fetching ...

Addressing Failures in Robotics using Vision-Based Language Models (VLMs) and Behavior Trees (BT)

Faseeh Ahmad, Jonathan Styrud, Volker Krueger

TL;DR

This paper introduces VLMs as a monitoring tool to detect and identify failures during task execution and generates missing conditions or skill templates that are incorporated into the BT, ensuring the system can autonomously address similar failures in future tasks.

Abstract

In this paper, we propose an approach that combines Vision Language Models (VLMs) and Behavior Trees (BTs) to address failures in robotics. Current robotic systems can handle known failures with pre-existing recovery strategies, but they are often ill-equipped to manage unknown failures or anomalies. We introduce VLMs as a monitoring tool to detect and identify failures during task execution. Additionally, VLMs generate missing conditions or skill templates that are then incorporated into the BT, ensuring the system can autonomously address similar failures in future tasks. We validate our approach through simulations in several failure scenarios.

Addressing Failures in Robotics using Vision-Based Language Models (VLMs) and Behavior Trees (BT)

TL;DR

This paper introduces VLMs as a monitoring tool to detect and identify failures during task execution and generates missing conditions or skill templates that are incorporated into the BT, ensuring the system can autonomously address similar failures in future tasks.

Abstract

In this paper, we propose an approach that combines Vision Language Models (VLMs) and Behavior Trees (BTs) to address failures in robotics. Current robotic systems can handle known failures with pre-existing recovery strategies, but they are often ill-equipped to manage unknown failures or anomalies. We introduce VLMs as a monitoring tool to detect and identify failures during task execution. Additionally, VLMs generate missing conditions or skill templates that are then incorporated into the BT, ensuring the system can autonomously address similar failures in future tasks. We validate our approach through simulations in several failure scenarios.

Paper Structure

This paper contains 12 sections, 3 figures.

Figures (3)

  • Figure 1: Overview of the proposed approach, where the VLM takes a set of images, skills, conditions, and a BT as input. The VLM uses this information to provide missing conditions or skills, which are then used to update the BT through a planner.
  • Figure 2: Comparison of Behavior Trees (BTs). The left side shows the initial BT, while the right side illustrates the updated BT, with the changes highlighted in red connections.
  • Figure 3: Scenes showing peg-in-hole task execution with obstacles. The first row (1(a)–1(d)) illustrates the task with a small obstacle, while the second row (2(a)–2(d)) depicts the task with a large obstacle.