Table of Contents
Fetching ...

Language to Map: Topological map generation from natural language path instructions

Hideki Deguchi, Kazuki Shibata, Shun Taguchi

TL;DR

This work addresses the high user effort required in vision-and-language navigation by enabling a map to be built from natural-language path descriptions and used to generate routes from destination names. It contrasts an implicit-memory approach in large language models with an explicit topological map approach that stores waypoints and per-node actions, showing the explicit method achieves markedly higher accuracy. The explicit-map system, using a canonical representation and LLM-assisted translation, substantially improves the ability to infer spatial structure and generate new paths, achieving over 90% success on reverse-path tasks and high success on recombined paths. This approach advances NL-driven robotic navigation by eliminating the need to repeatedly describe detailed paths and enabling robust path planning from language alone, with future work targeting instruction diversity and larger-scale environments.

Abstract

In this paper, a method for generating a map from path information described using natural language (textual path) is proposed. In recent years, robotics research mainly focus on vision-and-language navigation (VLN), a navigation task based on images and textual paths. Although VLN is expected to facilitate user instructions to robots, its current implementation requires users to explain the details of the path for each navigation session, which results in high explanation costs for users. To solve this problem, we proposed a method that creates a map as a topological map from a textual path and automatically creates a new path using this map. We believe that large language models (LLMs) can be used to understand textual path. Therefore, we propose and evaluate two methods, one for storing implicit maps in LLMs, and the other for generating explicit maps using LLMs. The implicit map is in the LLM's memory. It is created using prompts. In the explicit map, a topological map composed of nodes and edges is constructed and the actions at each node are stored. This makes it possible to estimate the path and actions at waypoints on an undescribed path, if enough information is available. Experimental results on path instructions generated in a real environment demonstrate that generating explicit maps achieves significantly higher accuracy than storing implicit maps in the LLMs.

Language to Map: Topological map generation from natural language path instructions

TL;DR

This work addresses the high user effort required in vision-and-language navigation by enabling a map to be built from natural-language path descriptions and used to generate routes from destination names. It contrasts an implicit-memory approach in large language models with an explicit topological map approach that stores waypoints and per-node actions, showing the explicit method achieves markedly higher accuracy. The explicit-map system, using a canonical representation and LLM-assisted translation, substantially improves the ability to infer spatial structure and generate new paths, achieving over 90% success on reverse-path tasks and high success on recombined paths. This approach advances NL-driven robotic navigation by eliminating the need to repeatedly describe detailed paths and enabling robust path planning from language alone, with future work targeting instruction diversity and larger-scale environments.

Abstract

In this paper, a method for generating a map from path information described using natural language (textual path) is proposed. In recent years, robotics research mainly focus on vision-and-language navigation (VLN), a navigation task based on images and textual paths. Although VLN is expected to facilitate user instructions to robots, its current implementation requires users to explain the details of the path for each navigation session, which results in high explanation costs for users. To solve this problem, we proposed a method that creates a map as a topological map from a textual path and automatically creates a new path using this map. We believe that large language models (LLMs) can be used to understand textual path. Therefore, we propose and evaluate two methods, one for storing implicit maps in LLMs, and the other for generating explicit maps using LLMs. The implicit map is in the LLM's memory. It is created using prompts. In the explicit map, a topological map composed of nodes and edges is constructed and the actions at each node are stored. This makes it possible to estimate the path and actions at waypoints on an undescribed path, if enough information is available. Experimental results on path instructions generated in a real environment demonstrate that generating explicit maps achieves significantly higher accuracy than storing implicit maps in the LLMs.
Paper Structure (15 sections, 9 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 15 sections, 9 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Image representation of task in this study. The above figure shows the map and the paths. The sentences in blue font underneath the figure describe the green and blue paths in the above figure. The task in this study was to input the detailed path as a bottom sentence and output the new path from the destination name.
  • Figure 2: System overview. The left side shows the system that generates a map from user’s instruction. The right side shows the system that outputs the new path from destination name.
  • Figure 3: The environment is shown in the figure below, where $n_1$$n_5$ is the name of each node.
  • Figure 4: Flowchart of LLMs' prompt in this study.
  • Figure 5: Example of the map with the path generation. The red circles describe the start or destination nodes and the yellow circles describe the waypoint nodes. The yellow lines show the edge.
  • ...and 2 more figures