Table of Contents
Fetching ...

Topo-Field: Topometric mapping with Brain-inspired Hierarchical Layout-Object-Position Fields

Jiawei Hou, Wenhao Guan, Longfei Liang, Jianfeng Feng, Xiangyang Xue, Taiping Zeng

TL;DR

Topo-Field proposes a brain-inspired topometric mapping framework that fuses Layout-Object-Position (LOP) associations into a neural implicit field $F:\mathbb{R}^3\to\mathbb{R}^n$, trained via a contrastive loss against CLIP and Sentence-BERT embeddings. It constructs a topometric graph $G=(V,E)$ by querying $F$ and uses LLMs to bootstrap relationships, updating with new observations to support planning. Evaluations on Matterport3D and apartment scenes demonstrate strong position attribute inference, accurate text/image query localization, and efficient navigation planning, illustrating a practical bridge between semantic scene understanding and real-time robotic operation. The approach reduces annotation demands by leveraging Large Foundation Models while preserving navigable, semantically informed representations.

Abstract

Mobile robots require comprehensive scene understanding to operate effectively in diverse environments, enriched with contextual information such as layouts, objects, and their relationships. Although advances like neural radiation fields (NeRFs) offer high-fidelity 3D reconstructions, they are computationally intensive and often lack efficient representations of traversable spaces essential for planning and navigation. In contrast, topological maps are computationally efficient but lack the semantic richness necessary for a more complete understanding of the environment. Inspired by a population code in the postrhinal cortex (POR) strongly tuned to spatial layouts over scene content rapidly forming a high-level cognitive map, this work introduces Topo-Field, a framework that integrates Layout-Object-Position (LOP) associations into a neural field and constructs a topometric map from this learned representation. LOP associations are modeled by explicitly encoding object and layout information, while a Large Foundation Model (LFM) technique allows for efficient training without extensive annotations. The topometric map is then constructed by querying the learned neural representation, offering both semantic richness and computational efficiency. Empirical evaluations in multi-room environments demonstrate the effectiveness of Topo-Field in tasks such as position attribute inference, query localization, and topometric planning, successfully bridging the gap between high-fidelity scene understanding and efficient robotic navigation.

Topo-Field: Topometric mapping with Brain-inspired Hierarchical Layout-Object-Position Fields

TL;DR

Topo-Field proposes a brain-inspired topometric mapping framework that fuses Layout-Object-Position (LOP) associations into a neural implicit field , trained via a contrastive loss against CLIP and Sentence-BERT embeddings. It constructs a topometric graph by querying and uses LLMs to bootstrap relationships, updating with new observations to support planning. Evaluations on Matterport3D and apartment scenes demonstrate strong position attribute inference, accurate text/image query localization, and efficient navigation planning, illustrating a practical bridge between semantic scene understanding and real-time robotic operation. The approach reduces annotation demands by leveraging Large Foundation Models while preserving navigable, semantically informed representations.

Abstract

Mobile robots require comprehensive scene understanding to operate effectively in diverse environments, enriched with contextual information such as layouts, objects, and their relationships. Although advances like neural radiation fields (NeRFs) offer high-fidelity 3D reconstructions, they are computationally intensive and often lack efficient representations of traversable spaces essential for planning and navigation. In contrast, topological maps are computationally efficient but lack the semantic richness necessary for a more complete understanding of the environment. Inspired by a population code in the postrhinal cortex (POR) strongly tuned to spatial layouts over scene content rapidly forming a high-level cognitive map, this work introduces Topo-Field, a framework that integrates Layout-Object-Position (LOP) associations into a neural field and constructs a topometric map from this learned representation. LOP associations are modeled by explicitly encoding object and layout information, while a Large Foundation Model (LFM) technique allows for efficient training without extensive annotations. The topometric map is then constructed by querying the learned neural representation, offering both semantic richness and computational efficiency. Empirical evaluations in multi-room environments demonstrate the effectiveness of Topo-Field in tasks such as position attribute inference, query localization, and topometric planning, successfully bridging the gap between high-fidelity scene understanding and efficient robotic navigation.
Paper Structure (19 sections, 7 equations, 7 figures, 4 tables)

This paper contains 19 sections, 7 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Illustration of the Topo-Field strategy and capabilities. Hierarchically dividing scene information into layout, object, and position to model them explicitly, layout-object-position associated knowledge enables robots with a topometric map representing the scene and planning navigable path to realize a more comprehensive spatial cognition.
  • Figure 2: Pipeline of the Topo-Field.(a) The ground truth generation of layout-object-position vision-language and semantic embeddings for weakly-supervising. (b) The neural implicit network mapping 3D positions to target feature space. A contrastive loss is optimized against each other. (c) Topometric mapping process with trained neural field.
  • Figure 3: Qualitative comparison of text query localization results among state-of-the-art methods and our method with text input in the form of "object in the region". Blue box shows the ground truth bounding box of object. Red box means miss-predicted box, while green box means the correctly predicted results.
  • Figure 4: Capabilities of the learned neural field. (a) The attributes inference using position input. (b) The LOP association helped localization of text and image queries.
  • Figure 5: Qualitative comparison of image query localization results in heatmaps form among state-of-the-art methods and our method with image input. Our approach localizes the position of queried image in an exact smaller range.
  • ...and 2 more figures