Table of Contents
Fetching ...

Semantic Environment Atlas for Object-Goal Navigation

Nuri Kim, Jeongho Park, Mineui Hong, Songhwai Oh

TL;DR

The paper introduces the Semantic Environment Atlas (SEA), a low-cost semantic navigation framework for embodied agents that builds Semantic Graph Maps (SGMs) consisting of place, image, and object nodes to capture place–place and place–object relationships. SEA uses a global policy for semantic path planning over a fused atlas of multiple environments and a local policy based on fast marching toward semantic subgoals, all without relying on a global pose sensor. Key contributions include the definition of place reachability $oldsymbol{ m oldsymbol{\Gamma}}$ and the place–object connection matrix $oldsymbol{ m R}$, Bayesian-like adaptive relation updates, and demonstrated gains on MP3D Matterport3D for object-goal navigation (e.g., 39.0% success, 12.4% above state-of-the-art) with robustness to pose noise and low computational cost. The results show that incorporating semantic relationships improves both localization and navigation efficiency, enabling robust, adaptable exploration in unseen environments without a global metric map, and opening avenues for future work in language grounding, dynamic settings, and interactive mapping.

Abstract

In this paper, we introduce the Semantic Environment Atlas (SEA), a novel mapping approach designed to enhance visual navigation capabilities of embodied agents. The SEA utilizes semantic graph maps that intricately delineate the relationships between places and objects, thereby enriching the navigational context. These maps are constructed from image observations and capture visual landmarks as sparsely encoded nodes within the environment. The SEA integrates multiple semantic maps from various environments, retaining a memory of place-object relationships, which proves invaluable for tasks such as visual localization and navigation. We developed navigation frameworks that effectively leverage the SEA, and we evaluated these frameworks through visual localization and object-goal navigation tasks. Our SEA-based localization framework significantly outperforms existing methods, accurately identifying locations from single query images. Experimental results in Habitat scenarios show that our method not only achieves a success rate of 39.0%, an improvement of 12.4% over the current state-of-the-art, but also maintains robustness under noisy odometry and actuation conditions, all while keeping computational costs low.

Semantic Environment Atlas for Object-Goal Navigation

TL;DR

The paper introduces the Semantic Environment Atlas (SEA), a low-cost semantic navigation framework for embodied agents that builds Semantic Graph Maps (SGMs) consisting of place, image, and object nodes to capture place–place and place–object relationships. SEA uses a global policy for semantic path planning over a fused atlas of multiple environments and a local policy based on fast marching toward semantic subgoals, all without relying on a global pose sensor. Key contributions include the definition of place reachability and the place–object connection matrix , Bayesian-like adaptive relation updates, and demonstrated gains on MP3D Matterport3D for object-goal navigation (e.g., 39.0% success, 12.4% above state-of-the-art) with robustness to pose noise and low computational cost. The results show that incorporating semantic relationships improves both localization and navigation efficiency, enabling robust, adaptable exploration in unseen environments without a global metric map, and opening avenues for future work in language grounding, dynamic settings, and interactive mapping.

Abstract

In this paper, we introduce the Semantic Environment Atlas (SEA), a novel mapping approach designed to enhance visual navigation capabilities of embodied agents. The SEA utilizes semantic graph maps that intricately delineate the relationships between places and objects, thereby enriching the navigational context. These maps are constructed from image observations and capture visual landmarks as sparsely encoded nodes within the environment. The SEA integrates multiple semantic maps from various environments, retaining a memory of place-object relationships, which proves invaluable for tasks such as visual localization and navigation. We developed navigation frameworks that effectively leverage the SEA, and we evaluated these frameworks through visual localization and object-goal navigation tasks. Our SEA-based localization framework significantly outperforms existing methods, accurately identifying locations from single query images. Experimental results in Habitat scenarios show that our method not only achieves a success rate of 39.0%, an improvement of 12.4% over the current state-of-the-art, but also maintains robustness under noisy odometry and actuation conditions, all while keeping computational costs low.

Paper Structure

This paper contains 60 sections, 6 equations, 24 figures, 3 tables.

Figures (24)

  • Figure 1: Overview of semantic environmental atlas (SEA). The semantic graph map (SGM) is updated using visual observations. Then, the place relationship and place-object connections across environments are updated using multiple semantic graph maps collected from different environments. A global policy samples a subgoal $g_t$, which is reachable and most likely to be near to the target place. A local policy generates navigational actions to reach the subgoal.
  • Figure 2: Construction of semantic graph map. By integrating the current observation and the previous semantic graph map (SGM; $E_{t-1}$), the graph map is updated. If it is discovered that the current location differs from the previous location, an image node is added to the graph. Similarly, object nodes are added to the graph when previously undetected objects are detected.
  • Figure 3: Formation of Semantic Graph Map from Training Environments. The figure illustrates the process of forming a semantic graph map from multiple training environments. In each training environment, it checks to see whether there is a pair of place clusters. If there a connection between the pair of places, the reachability is set to one; otherwise, zero.
  • Figure 4: Adaptive SEA update procedure.
  • Figure 5: Training cost.
  • ...and 19 more figures