Embedding Pose Graph, Enabling 3D Foundation Model Capabilities with a Compact Representation

Hugues Thomas; Mouli Sivapurapu; Jian Zhang

Embedding Pose Graph, Enabling 3D Foundation Model Capabilities with a Compact Representation

Hugues Thomas, Mouli Sivapurapu, Jian Zhang

TL;DR

Abstract

This paper presents the Embedding Pose Graph (EPG), an innovative method that combines the strengths of foundation models with a simple 3D representation suitable for robotics applications. Addressing the need for efficient spatial understanding in robotics, EPG provides a compact yet powerful approach by attaching foundation model features to the nodes of a pose graph. Unlike traditional methods that rely on bulky data formats like voxel grids or point clouds, EPG is lightweight and scalable. It facilitates a range of robotic tasks, including open-vocabulary querying, disambiguation, image-based querying, language-directed navigation, and re-localization in 3D environments. We showcase the effectiveness of EPG in handling these tasks, demonstrating its capacity to improve how robots interact with and navigate through complex spaces. Through both qualitative and quantitative assessments, we illustrate EPG's strong performance and its ability to outperform existing methods in re-localization. Our work introduces a crucial step forward in enabling robots to efficiently understand and operate within large-scale 3D spaces.

Embedding Pose Graph, Enabling 3D Foundation Model Capabilities with a Compact Representation

TL;DR

Abstract

Paper Structure (22 sections, 3 equations, 7 figures, 2 tables)

This paper contains 22 sections, 3 equations, 7 figures, 2 tables.

INTRODUCTION
RELATED WORK
BUILDING OF AN EPG
Initial Setup and Data Structure
Efficient Construction Process
EPG, A VERSATILE TOOL
Open-vocabulary Queries
Disambiguation
Language-Directed Navigation
Image-Based Queries
Re-Localization
EXPERIMENTS
Datasets
ScanNet
KITTI
...and 7 more sections

Figures (7)

Figure 1: An Embedding Pose Graph visualized with the scene mesh on the ScanNet dataset.
Figure 2: EPG is a compact and versatile tool for spatial understanding in various robotics applications. Once built for a 3D scene, it allows efficient text or image query.
Figure 3: Illustration of the angular partitioning in EPG for different values of $d\theta$ and $d\phi$.
Figure 4: Illustration of the building process of an EPG.
Figure 5: Our Spatial Gaussian Voting scheme for bundle re-localization illustrated with a $K_b=3$ bundle size and $K_c=2$ candidates for clarity.
...and 2 more figures

Embedding Pose Graph, Enabling 3D Foundation Model Capabilities with a Compact Representation

TL;DR

Abstract

Embedding Pose Graph, Enabling 3D Foundation Model Capabilities with a Compact Representation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)