Table of Contents
Fetching ...

Retrieval-Augmented Embodied Agents

Yichen Zhu, Zhicai Ou, Xiaofeng Mou, Jian Tang

TL;DR

This paper tackles the data-efficiency challenge in embodied robotics by introducing Retrieval-Augmented Embodied Agents (RAEA), which leverage an external policy memory bank to guide action. It comprises a multi-modal policy retriever and a policy generator that uses retrieved policies through cross-attention within a Transformer framework, embedding information from diverse embodiments via an external dataset such as Open X-Embodiment. Through extensive simulations on Franka Kitchen, MetaWorld, and Maniskill-2, plus real-robot experiments, RAEA demonstrates improved generalization and performance, especially in low-data regimes, compared to state-of-the-art baselines. The approach highlights the practical impact of memory-augmented, cross-embodiment learning for robust robotic manipulation in uncertain environments.

Abstract

Embodied agents operating in complex and uncertain environments face considerable challenges. While some advanced agents handle complex manipulation tasks with proficiency, their success often hinges on extensive training data to develop their capabilities. In contrast, humans typically rely on recalling past experiences and analogous situations to solve new problems. Aiming to emulate this human approach in robotics, we introduce the Retrieval-Augmented Embodied Agent (RAEA). This innovative system equips robots with a form of shared memory, significantly enhancing their performance. Our approach integrates a policy retriever, allowing robots to access relevant strategies from an external policy memory bank based on multi-modal inputs. Additionally, a policy generator is employed to assimilate these strategies into the learning process, enabling robots to formulate effective responses to tasks. Extensive testing of RAEA in both simulated and real-world scenarios demonstrates its superior performance over traditional methods, representing a major leap forward in robotic technology.

Retrieval-Augmented Embodied Agents

TL;DR

This paper tackles the data-efficiency challenge in embodied robotics by introducing Retrieval-Augmented Embodied Agents (RAEA), which leverage an external policy memory bank to guide action. It comprises a multi-modal policy retriever and a policy generator that uses retrieved policies through cross-attention within a Transformer framework, embedding information from diverse embodiments via an external dataset such as Open X-Embodiment. Through extensive simulations on Franka Kitchen, MetaWorld, and Maniskill-2, plus real-robot experiments, RAEA demonstrates improved generalization and performance, especially in low-data regimes, compared to state-of-the-art baselines. The approach highlights the practical impact of memory-augmented, cross-embodiment learning for robust robotic manipulation in uncertain environments.

Abstract

Embodied agents operating in complex and uncertain environments face considerable challenges. While some advanced agents handle complex manipulation tasks with proficiency, their success often hinges on extensive training data to develop their capabilities. In contrast, humans typically rely on recalling past experiences and analogous situations to solve new problems. Aiming to emulate this human approach in robotics, we introduce the Retrieval-Augmented Embodied Agent (RAEA). This innovative system equips robots with a form of shared memory, significantly enhancing their performance. Our approach integrates a policy retriever, allowing robots to access relevant strategies from an external policy memory bank based on multi-modal inputs. Additionally, a policy generator is employed to assimilate these strategies into the learning process, enabling robots to formulate effective responses to tasks. Extensive testing of RAEA in both simulated and real-world scenarios demonstrates its superior performance over traditional methods, representing a major leap forward in robotic technology.
Paper Structure (10 sections, 5 equations, 6 figures, 4 tables)

This paper contains 10 sections, 5 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The overview of our retrieval-augmented embodied agents. We utilize a policy retriever to extract policies from a policy memory bank, which contains large-scale robotic data across multiple embodiments. Then, we use the policy generator to reference the retrieved policy and output actions for the current input.
  • Figure 2: Examples of simulated and real-world environments that we used for evaluation.
  • Figure 3: The framework of policy retriever (top) and policy generator (bottom) in our work. The policy retriever retrieves the relevant policy based on multi-modal input, and the policy generator processes a list of retrieved policies to help train in the current environment.
  • Figure 4: Left: The setup of our Franka real robot. Right: The example of some tasks that we collected.
  • Figure 5: Performance of RAEA in Franka Kitchen with 10 or 25 demonstrations
  • ...and 1 more figures