Table of Contents
Fetching ...

Empowering Robot Path Planning with Large Language Models: osmAG Map Topology & Hierarchy Comprehension with LLMs

Fujing Xie, Sören Schwertfeger

TL;DR

This paper explores enabling LLMs to comprehend the topology and hierarchy of Area Graph, a text-based hierarchical, topometric semantic map representation utilizing polygons to demark areas such as rooms or buildings.

Abstract

Large Language Models (LLMs) have demonstrated great potential in robotic applications by providing essential general knowledge. Mobile robots rely on map comprehension for tasks like localization and navigation. In this paper, we explore enabling LLMs to comprehend the topology and hierarchy of Area Graph, a text-based hierarchical, topometric semantic map representation utilizing polygons to demark areas such as rooms or buildings. Our experiments demonstrate that with the right map representation, LLMs can effectively comprehend Area Graph's topology and hierarchy. After straightforward fine-tuning, the LLaMA2 models exceeded ChatGPT-3.5 in mastering these aspects. Our dataset, dataset generation code, fine-tuned LoRA adapters can be accessed at https://github.com/xiefujing/LLM-osmAG-Comprehension.

Empowering Robot Path Planning with Large Language Models: osmAG Map Topology & Hierarchy Comprehension with LLMs

TL;DR

This paper explores enabling LLMs to comprehend the topology and hierarchy of Area Graph, a text-based hierarchical, topometric semantic map representation utilizing polygons to demark areas such as rooms or buildings.

Abstract

Large Language Models (LLMs) have demonstrated great potential in robotic applications by providing essential general knowledge. Mobile robots rely on map comprehension for tasks like localization and navigation. In this paper, we explore enabling LLMs to comprehend the topology and hierarchy of Area Graph, a text-based hierarchical, topometric semantic map representation utilizing polygons to demark areas such as rooms or buildings. Our experiments demonstrate that with the right map representation, LLMs can effectively comprehend Area Graph's topology and hierarchy. After straightforward fine-tuning, the LLaMA2 models exceeded ChatGPT-3.5 in mastering these aspects. Our dataset, dataset generation code, fine-tuned LoRA adapters can be accessed at https://github.com/xiefujing/LLM-osmAG-Comprehension.
Paper Structure (19 sections, 7 figures, 2 tables)

This paper contains 19 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: The figure above depicts a real-life situation encountered by a 3rd-party delivery robot on our University campus, where it is blocked by an intersection closure. Below the notification sent by Office of General Services announcing this closure is shown.
  • Figure 2: osmAG visualization in JOSM displays areas in blue polygons, passages in red lines, and nodes coordinates in yellow dots. ChatGPT-4 identifies the shortest path via a two-way elevator in red. After given an elevator maintenance notice, it recommends an alternate route in green.
  • Figure 3: Our prompts include two main elements: a task description in a blue box and the osmAG map in a green box. We offer three prompt levels: Level 1 with just the description and map, Level 2 with a simple example in the upper yellow box, and Level 3 with a detailed example in the lower yellow box.
  • Figure 4: The osmAG map representation and two of its variants. The osmAG enclosed in blue box represents the original format that uses 'passage' to describe connections between areas. In the yellow box we illustrate Variant 1 of osmAG, which introduces a tag with a key set to "connected_area" and a value corresponding to the area connected via the passage. Variant 2 of osmAG displayed in the green box, modifies Variant 1 by replacing "connected_area" with "current area name_directly_connected_room".
  • Figure 5: Hand-drawn map layout templates created using JOSM, with areas depicted as blue polygons, passages represented by red lines, and room names in red. Templates (a), (b), and (c) were utilized to generate datasets for fine-tuning the LLaMA2 model. Template (d), on the other hand, was exclusively used in the test dataset to assess the LLM's capacity to adapt to unseen layouts.
  • ...and 2 more figures