Retrieval-Augmented Embodied Agents
Yichen Zhu, Zhicai Ou, Xiaofeng Mou, Jian Tang
TL;DR
This paper tackles the data-efficiency challenge in embodied robotics by introducing Retrieval-Augmented Embodied Agents (RAEA), which leverage an external policy memory bank to guide action. It comprises a multi-modal policy retriever and a policy generator that uses retrieved policies through cross-attention within a Transformer framework, embedding information from diverse embodiments via an external dataset such as Open X-Embodiment. Through extensive simulations on Franka Kitchen, MetaWorld, and Maniskill-2, plus real-robot experiments, RAEA demonstrates improved generalization and performance, especially in low-data regimes, compared to state-of-the-art baselines. The approach highlights the practical impact of memory-augmented, cross-embodiment learning for robust robotic manipulation in uncertain environments.
Abstract
Embodied agents operating in complex and uncertain environments face considerable challenges. While some advanced agents handle complex manipulation tasks with proficiency, their success often hinges on extensive training data to develop their capabilities. In contrast, humans typically rely on recalling past experiences and analogous situations to solve new problems. Aiming to emulate this human approach in robotics, we introduce the Retrieval-Augmented Embodied Agent (RAEA). This innovative system equips robots with a form of shared memory, significantly enhancing their performance. Our approach integrates a policy retriever, allowing robots to access relevant strategies from an external policy memory bank based on multi-modal inputs. Additionally, a policy generator is employed to assimilate these strategies into the learning process, enabling robots to formulate effective responses to tasks. Extensive testing of RAEA in both simulated and real-world scenarios demonstrates its superior performance over traditional methods, representing a major leap forward in robotic technology.
