Table of Contents
Fetching ...

From Reactive to Map-Based AI: Tuned Local LLMs for Semantic Zone Inference in Object-Goal Navigation

Yudai Noda, Kanji Tanaka

TL;DR

This work proposes a transition from reactive AI toMap-Based AI by integrating LLM-based semantic inference with a hybrid topological-grid mapping system, enabling the agent to prioritize high-probability areas and perform systematic exploration via Traveling Salesman Problem (TSP) optimization.

Abstract

Object-Goal Navigation (ObjectNav) requires an agent to find and navigate to a target object category in unknown environments. While recent Large Language Model (LLM)-based agents exhibit zero-shot reasoning, they often rely on a "reactive" paradigm that lacks explicit spatial memory, leading to redundant exploration and myopic behaviors. To address these limitations, we propose a transition from reactive AI to "Map-Based AI" by integrating LLM-based semantic inference with a hybrid topological-grid mapping system. Our framework employs a fine-tuned Llama-2 model via Low-Rank Adaptation (LoRA) to infer semantic zone categories and target existence probabilities from verbalized object observations. In this study, a "zone" is defined as a functional area described by the set of observed objects, providing crucial semantic co-occurrence cues for finding the target. This semantic information is integrated into a topological graph, enabling the agent to prioritize high-probability areas and perform systematic exploration via Traveling Salesman Problem (TSP) optimization. Evaluations in the AI2-THOR simulator demonstrate that our approach significantly outperforms traditional frontier exploration and reactive LLM baselines, achieving a superior Success Rate (SR) and Success weighted by Path Length (SPL).

From Reactive to Map-Based AI: Tuned Local LLMs for Semantic Zone Inference in Object-Goal Navigation

TL;DR

This work proposes a transition from reactive AI toMap-Based AI by integrating LLM-based semantic inference with a hybrid topological-grid mapping system, enabling the agent to prioritize high-probability areas and perform systematic exploration via Traveling Salesman Problem (TSP) optimization.

Abstract

Object-Goal Navigation (ObjectNav) requires an agent to find and navigate to a target object category in unknown environments. While recent Large Language Model (LLM)-based agents exhibit zero-shot reasoning, they often rely on a "reactive" paradigm that lacks explicit spatial memory, leading to redundant exploration and myopic behaviors. To address these limitations, we propose a transition from reactive AI to "Map-Based AI" by integrating LLM-based semantic inference with a hybrid topological-grid mapping system. Our framework employs a fine-tuned Llama-2 model via Low-Rank Adaptation (LoRA) to infer semantic zone categories and target existence probabilities from verbalized object observations. In this study, a "zone" is defined as a functional area described by the set of observed objects, providing crucial semantic co-occurrence cues for finding the target. This semantic information is integrated into a topological graph, enabling the agent to prioritize high-probability areas and perform systematic exploration via Traveling Salesman Problem (TSP) optimization. Evaluations in the AI2-THOR simulator demonstrate that our approach significantly outperforms traditional frontier exploration and reactive LLM baselines, achieving a superior Success Rate (SR) and Success weighted by Path Length (SPL).
Paper Structure (34 sections, 2 equations, 5 figures)

This paper contains 34 sections, 2 equations, 5 figures.

Figures (5)

  • Figure 1: Concept of the proposed memory-driven planning. Compared to (a) traditional observation-driven planning, which often results in myopic and redundant behaviors due to the lack of spatial memory, our approach transitions toward (b) map-conditioned planning. By integrating a hybrid semantic map with LLM-based reasoning, the agent achieves (c) efficient object-goal navigation through spatially-grounded decision-making and systematic exploration.
  • Figure 2: Architecture of the proactive exploration system. It integrates semantic mapping, spatial correlation reasoning, and a strategic path planner.
  • Figure 3: Overview of simulation environments. (a-b) show the spatial and semantic definitions, while (c-e) present the top-down views of various test scenes.
  • Figure 4: First-person observations in living and sleeping quarters. These images represent the visual input for the object detection and localization modules.
  • Figure 5: Quantitative performance evaluation. Our proposed method significantly outperforms baseline approaches in both search speed and path efficiency.