Table of Contents
Fetching ...

Embodied-RAG: General Non-parametric Embodied Memory for Retrieval and Generation

Quanting Xie, So Yeon Min, Pengliang Ji, Yue Yang, Tianyi Zhang, Kedi Xu, Aarav Bajaj, Ruslan Salakhutdinov, Matthew Johnson-Roberson, Yonatan Bisk

TL;DR

<3-5 sentence high-level summary> This document serves as a comprehensive guide to using the IEEEtran LaTeX class and templates, covering front matter, body elements, back matter, and mathematical typography to help authors produce IEEE-compliant manuscripts. It details where to obtain templates and distributions, how to structure documents across various publication types, and best practices for formatting figures, tables, equations, and references. The guide emphasizes that templates are approximations intended to facilitate production workflows (including XML conversion) and points readers to additional resources and checklists to minimize formatting errors. Collectively, it equips authors with practical, reproducible steps to craft properly formatted IEEE submissions across different venues and platforms.

Abstract

There is no limit to how much a robot might explore and learn, but all of that knowledge needs to be searchable and actionable. Within language research, retrieval augmented generation (RAG) has become the workhorse of large-scale non-parametric knowledge; however, existing techniques do not directly transfer to the embodied domain, which is multimodal, where data is highly correlated, and perception requires abstraction. To address these challenges, we introduce Embodied-RAG, a framework that enhances the foundational model of an embodied agent with a non-parametric memory system capable of autonomously constructing hierarchical knowledge for both navigation and language generation. Embodied-RAG handles a full range of spatial and semantic resolutions across diverse environments and query types, whether for a specific object or a holistic description of ambiance. At its core, Embodied-RAG's memory is structured as a semantic forest, storing language descriptions at varying levels of detail. This hierarchical organization allows the system to efficiently generate context-sensitive outputs across different robotic platforms. We demonstrate that Embodied-RAG effectively bridges RAG to the robotics domain, successfully handling over 250 explanation and navigation queries across kilometer-level environments, highlighting its promise as a general-purpose non-parametric system for embodied agents.

Embodied-RAG: General Non-parametric Embodied Memory for Retrieval and Generation

TL;DR

<3-5 sentence high-level summary> This document serves as a comprehensive guide to using the IEEEtran LaTeX class and templates, covering front matter, body elements, back matter, and mathematical typography to help authors produce IEEE-compliant manuscripts. It details where to obtain templates and distributions, how to structure documents across various publication types, and best practices for formatting figures, tables, equations, and references. The guide emphasizes that templates are approximations intended to facilitate production workflows (including XML conversion) and points readers to additional resources and checklists to minimize formatting errors. Collectively, it equips authors with practical, reproducible steps to craft properly formatted IEEE submissions across different venues and platforms.

Abstract

There is no limit to how much a robot might explore and learn, but all of that knowledge needs to be searchable and actionable. Within language research, retrieval augmented generation (RAG) has become the workhorse of large-scale non-parametric knowledge; however, existing techniques do not directly transfer to the embodied domain, which is multimodal, where data is highly correlated, and perception requires abstraction. To address these challenges, we introduce Embodied-RAG, a framework that enhances the foundational model of an embodied agent with a non-parametric memory system capable of autonomously constructing hierarchical knowledge for both navigation and language generation. Embodied-RAG handles a full range of spatial and semantic resolutions across diverse environments and query types, whether for a specific object or a holistic description of ambiance. At its core, Embodied-RAG's memory is structured as a semantic forest, storing language descriptions at varying levels of detail. This hierarchical organization allows the system to efficiently generate context-sensitive outputs across different robotic platforms. We demonstrate that Embodied-RAG effectively bridges RAG to the robotics domain, successfully handling over 250 explanation and navigation queries across kilometer-level environments, highlighting its promise as a general-purpose non-parametric system for embodied agents.
Paper Structure (42 sections, 17 equations, 1 figure, 1 table)