Table of Contents
Fetching ...

GET: Goal-directed Exploration and Targeting for Large-Scale Unknown Environments

Lanxiang Zheng, Ruidong Mei, Mingxin Wei, Hao Ren, Hui Cheng

TL;DR

GET addresses object search in large-scale unknown environments by integrating LLM-based semantic reasoning with memory-guided exploration. It introduces Diagram of Unified Thought (DoUT) to provide real-time, feedback-driven decision refinement and uses a Gaussian Mixture Model (GMM) based Task Probability Map to continually update target-location priors. A Semantic Octomap and a Trajectory Refinement module complete the perception and navigation stack, enabling safe, efficient exploration. Real-world experiments show GET achieving substantial reductions in path length and search time across multiple scenes and LLMs, outperforming heuristic and LLM-only baselines and demonstrating scalable, embodied decision-making in complex environments.

Abstract

Object search in large-scale, unstructured environments remains a fundamental challenge in robotics, particularly in dynamic or expansive settings such as outdoor autonomous exploration. This task requires robust spatial reasoning and the ability to leverage prior experiences. While Large Language Models (LLMs) offer strong semantic capabilities, their application in embodied contexts is limited by a grounding gap in spatial reasoning and insufficient mechanisms for memory integration and decision consistency.To address these challenges, we propose GET (Goal-directed Exploration and Targeting), a framework that enhances object search by combining LLM-based reasoning with experience-guided exploration. At its core is DoUT (Diagram of Unified Thought), a reasoning module that facilitates real-time decision-making through a role-based feedback loop, integrating task-specific criteria and external memory. For repeated tasks, GET maintains a probabilistic task map based on a Gaussian Mixture Model, allowing for continual updates to object-location priors as environments evolve.Experiments conducted in real-world, large-scale environments demonstrate that GET improves search efficiency and robustness across multiple LLMs and task settings, significantly outperforming heuristic and LLM-only baselines. These results suggest that structured LLM integration provides a scalable and generalizable approach to embodied decision-making in complex environments.

GET: Goal-directed Exploration and Targeting for Large-Scale Unknown Environments

TL;DR

GET addresses object search in large-scale unknown environments by integrating LLM-based semantic reasoning with memory-guided exploration. It introduces Diagram of Unified Thought (DoUT) to provide real-time, feedback-driven decision refinement and uses a Gaussian Mixture Model (GMM) based Task Probability Map to continually update target-location priors. A Semantic Octomap and a Trajectory Refinement module complete the perception and navigation stack, enabling safe, efficient exploration. Real-world experiments show GET achieving substantial reductions in path length and search time across multiple scenes and LLMs, outperforming heuristic and LLM-only baselines and demonstrating scalable, embodied decision-making in complex environments.

Abstract

Object search in large-scale, unstructured environments remains a fundamental challenge in robotics, particularly in dynamic or expansive settings such as outdoor autonomous exploration. This task requires robust spatial reasoning and the ability to leverage prior experiences. While Large Language Models (LLMs) offer strong semantic capabilities, their application in embodied contexts is limited by a grounding gap in spatial reasoning and insufficient mechanisms for memory integration and decision consistency.To address these challenges, we propose GET (Goal-directed Exploration and Targeting), a framework that enhances object search by combining LLM-based reasoning with experience-guided exploration. At its core is DoUT (Diagram of Unified Thought), a reasoning module that facilitates real-time decision-making through a role-based feedback loop, integrating task-specific criteria and external memory. For repeated tasks, GET maintains a probabilistic task map based on a Gaussian Mixture Model, allowing for continual updates to object-location priors as environments evolve.Experiments conducted in real-world, large-scale environments demonstrate that GET improves search efficiency and robustness across multiple LLMs and task settings, significantly outperforming heuristic and LLM-only baselines. These results suggest that structured LLM integration provides a scalable and generalizable approach to embodied decision-making in complex environments.

Paper Structure

This paper contains 45 sections, 12 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: The GET framework consists of several modules: The Perception module converts panoramic images and point clouds into semantic point clouds. The Memory model features a real-time updated semantic octomap and a multi-layered task probability map that record the environment and historical experiences. The Decision-Making module operates in two modes: reasoning search, where the DoUT module infers potential target locations to guide the robot's search process, and experience-based search, where the robot utilizes historical data and the semantic octomap to locate the target and plan search routes. Finally, the Trajectory Refinement module optimizes the robot's path for safe, smooth, and continuous navigation.
  • Figure 2: An example of a DoUT-based search task. (a) and (b) present the inputs to DoUT: the task description and environmental panorama. (c1–c3) show candidate propositions generated by the Proposer, while (d1–d3) provide corresponding feedback from the Evaluator. Feedback in (d1–d2) indicates violations of Mandatory Criteria and is returned directly to the Proposer for corrections, whereas (d3) demonstrates valid feedback evaluated by Advisory Criteria.
  • Figure 3: Experimental setups for the robot and real-world scenes.
  • Figure 4: Comparison of the proposed algorithm's search trajectories against benchmarks. (a) and (b) illustrate searches without prior experience, while (c) and (d) show experienced searches utilizing historical search experience, with variations in both the starting position and the search object.
  • Figure 5: Comparison of DoT and DoUT structures. (a) DoT employs a sequential iterative reasoning process represented by a directed acyclic graph, where the LLM generates propositions, evaluates them using critiques, and refines or verifies outputs based on feedback. (b) DoUT enhances this process with a parallel framework by integrating unified evaluation criteria through an external Evaluator, which ranks candidate propositions from the LLM based on mandatory and advisory criteria, providing structured feedback to improve decision-making and reasoning efficiency.
  • ...and 4 more figures