Table of Contents
Fetching ...

I Can Tell What I am Doing: Toward Real-World Natural Language Grounding of Robot Experiences

Zihan Wang, Brian Liang, Varad Dhat, Zander Brumbaugh, Nick Walker, Ranjay Krishna, Maya Cakmak

TL;DR

ONAR is introduced, an LLM-based system that generates natural language narrations from robot experiences, aiding in behavior announcement, failure analysis, and human interaction to recover failure, outperforms state-of-the-art methods and improves failure recovery efficiency.

Abstract

Understanding robot behaviors and experiences through natural language is crucial for developing intelligent and transparent robotic systems. Recent advancement in large language models (LLMs) makes it possible to translate complex, multi-modal robotic experiences into coherent, human-readable narratives. However, grounding real-world robot experiences into natural language is challenging due to many reasons, such as multi-modal nature of data, differing sample rates, and data volume. We introduce RONAR, an LLM-based system that generates natural language narrations from robot experiences, aiding in behavior announcement, failure analysis, and human interaction to recover failure. Evaluated across various scenarios, RONAR outperforms state-of-the-art methods and improves failure recovery efficiency. Our contributions include a multi-modal framework for robot experience narration, a comprehensive real-robot dataset, and empirical evidence of RONAR's effectiveness in enhancing user experience in system transparency and failure analysis.

I Can Tell What I am Doing: Toward Real-World Natural Language Grounding of Robot Experiences

TL;DR

ONAR is introduced, an LLM-based system that generates natural language narrations from robot experiences, aiding in behavior announcement, failure analysis, and human interaction to recover failure, outperforms state-of-the-art methods and improves failure recovery efficiency.

Abstract

Understanding robot behaviors and experiences through natural language is crucial for developing intelligent and transparent robotic systems. Recent advancement in large language models (LLMs) makes it possible to translate complex, multi-modal robotic experiences into coherent, human-readable narratives. However, grounding real-world robot experiences into natural language is challenging due to many reasons, such as multi-modal nature of data, differing sample rates, and data volume. We introduce RONAR, an LLM-based system that generates natural language narrations from robot experiences, aiding in behavior announcement, failure analysis, and human interaction to recover failure. Evaluated across various scenarios, RONAR outperforms state-of-the-art methods and improves failure recovery efficiency. Our contributions include a multi-modal framework for robot experience narration, a comprehensive real-robot dataset, and empirical evidence of RONAR's effectiveness in enhancing user experience in system transparency and failure analysis.

Paper Structure

This paper contains 33 sections, 3 equations, 19 figures, 5 tables.

Figures (19)

  • Figure 1: Left: Our framework for real-world robot narration, RONAR. It takes in four categories of dynamic inputs and one static input: multimodal environmental observations (E), robot internal states (I), task planner (TP), and specified conditions (C), along with robot specifications (SP). RONAR then uses its LLM-based narration engine to process these inputs and generate narrations based on the specified narration mode. The generated narration can be used to address downstream narration-related tasks. Right: The RoboNar dataset. It includes four daily housekeeping tasks with real failure cases, containing ground truth failure explanations and recovery descriptions labeled by human experts.
  • Figure 2: RONAR: Our framework for real-world robot narration. It has three parts, which are key frame selection, experience summarization and narration generation. It takes in the raw multimodal robot data stream and outputs text describing past experiences, current observations, and future plans of the robot.
  • Figure 3: Example of narrations generated by RONAR with different modes.
  • Figure 4: RoboNar Dataset: We design four long-horizon tasks for a Stretch robot in a home environment. Left: the different tasks with base and manipulator trajectories. It also shows states the robot experiences in each task. Right: the number of failure cases under each robot state in the dataset. The pictures are failure cases selected from the dataset and the text are human-expert-provided ground truth labels for the frames.
  • Figure 5: Accuracy on failure analysis tasks using different methods.
  • ...and 14 more figures