Table of Contents
Fetching ...

SENT Map -- Semantically Enhanced Topological Maps with Foundation Models

Raj Surya Rajendran Kathirvel, Zach A Chavis, Stephen J. Guy, Karthik Desingh

TL;DR

SENT-Map addresses the challenge of semantically grounding autonomous indoor navigation by grounding foundation-model planning in a topological map. It introduces a JSON-based SENT-Map representation that combines a two-stage workflow: operator-guided mapping with a Vision-FM to create a human-editable map ${\mathcal{M}}=G(V,E)$ with semantic subset $V_{SE}$, and Planning-FM-driven planning that converts the map and natural-language queries into grounded task plans constrained by the robot's skills. The key contributions are the SENT-Map representation, a framework for human-guided map construction, and an end-to-end planning approach that remains robust even for small locally-deployable FMs, demonstrated on tasks requiring semantic reasoning and object ownership. This work enables reliable planning in open-world indoor environments and provides a transparent, editable semantic representation that facilitates verification and refinement by humans.

Abstract

We introduce SENT-Map, a semantically enhanced topological map for representing indoor environments, designed to support autonomous navigation and manipulation by leveraging advancements in foundational models (FMs). Through representing the environment in a JSON text format, we enable semantic information to be added and edited in a format that both humans and FMs understand, while grounding the robot to existing nodes during planning to avoid infeasible states during deployment. Our proposed framework employs a two stage approach, first mapping the environment alongside an operator with a Vision-FM, then using the SENT-Map representation alongside a natural-language query within an FM for planning. Our experimental results show that semantic-enhancement enables even small locally-deployable FMs to successfully plan over indoor environments.

SENT Map -- Semantically Enhanced Topological Maps with Foundation Models

TL;DR

SENT-Map addresses the challenge of semantically grounding autonomous indoor navigation by grounding foundation-model planning in a topological map. It introduces a JSON-based SENT-Map representation that combines a two-stage workflow: operator-guided mapping with a Vision-FM to create a human-editable map with semantic subset , and Planning-FM-driven planning that converts the map and natural-language queries into grounded task plans constrained by the robot's skills. The key contributions are the SENT-Map representation, a framework for human-guided map construction, and an end-to-end planning approach that remains robust even for small locally-deployable FMs, demonstrated on tasks requiring semantic reasoning and object ownership. This work enables reliable planning in open-world indoor environments and provides a transparent, editable semantic representation that facilitates verification and refinement by humans.

Abstract

We introduce SENT-Map, a semantically enhanced topological map for representing indoor environments, designed to support autonomous navigation and manipulation by leveraging advancements in foundational models (FMs). Through representing the environment in a JSON text format, we enable semantic information to be added and edited in a format that both humans and FMs understand, while grounding the robot to existing nodes during planning to avoid infeasible states during deployment. Our proposed framework employs a two stage approach, first mapping the environment alongside an operator with a Vision-FM, then using the SENT-Map representation alongside a natural-language query within an FM for planning. Our experimental results show that semantic-enhancement enables even small locally-deployable FMs to successfully plan over indoor environments.

Paper Structure

This paper contains 11 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: SENT-Map Framework. (a) An operator defines a map alongside a robot. (b) Images and operator prompt are given to a Scene Representation FM, which outputs a node in JSON. (c) A collection of nodes defines our semantic graph. Due to the interpretability of JSON, the operator is free to make additions or corrections within the JSON. (d) The full JSON graph is fed to a planning FM alongside a query, and a skill sequence is output. (e) The robot then executes the skill sequence within the environment.
  • Figure 2: Semantic Ambiguity. The topological map of our indoor environment contains several instances of drinks and tables, two nodes of which are pictured here. When given a task asking for a "tissue", each FM knows that a desk or table is a likely location for a tissue box, but is forced to make a guess without additional semantic context. Similarly, the FM must guess between drinks when queried for a beverage with no context of who a drink belongs to.