Table of Contents
Fetching ...

CogExplore: Contextual Exploration with Language-Encoded Environment Representations

Harel Biggie, Patrick Cooper, Doncey Albin, Kristen Such, Christoffer Heckman

TL;DR

The paper tackles exploring unknown environments by integrating large language models to ground semantic and temporal context into navigation. CogExplore encodes scene elements as natural language and uses a probabilistic, memory‑augmented prompting framework to select informative waypoints, balancing geometric reach with semantic cues. By combining a frontier‑based planner, an open vocabulary detection pipeline, and modular prompts, the approach achieves robust, temporally coherent exploration with explicit justification for decisions. In photorealistic Unreal Engine experiments across multiple environments, CogExplore delivers 100% success and shorter path lengths than baselines, demonstrating the practical impact of language‑grounded reasoning for search‑and‑rescue style tasks, while acknowledging simulation‑based limits and the need for real‑world validation.

Abstract

Integrating language models into robotic exploration frameworks improves performance in unmapped environments by providing the ability to reason over semantic groundings, contextual cues, and temporal states. The proposed method employs large language models (GPT-3.5 and Claude Haiku) to reason over these cues and express that reasoning in terms of natural language, which can be used to inform future states. We are motivated by the context of search-and-rescue applications where efficient exploration is critical. We find that by leveraging natural language, semantics, and tracking temporal states, the proposed method greatly reduces exploration path distance and further exposes the need for environment-dependent heuristics. Moreover, the method is highly robust to a variety of environments and noisy vision detections, as shown with a 100% success rate in a series of comprehensive experiments across three different environments conducted in a custom simulation pipeline operating in Unreal Engine.

CogExplore: Contextual Exploration with Language-Encoded Environment Representations

TL;DR

The paper tackles exploring unknown environments by integrating large language models to ground semantic and temporal context into navigation. CogExplore encodes scene elements as natural language and uses a probabilistic, memory‑augmented prompting framework to select informative waypoints, balancing geometric reach with semantic cues. By combining a frontier‑based planner, an open vocabulary detection pipeline, and modular prompts, the approach achieves robust, temporally coherent exploration with explicit justification for decisions. In photorealistic Unreal Engine experiments across multiple environments, CogExplore delivers 100% success and shorter path lengths than baselines, demonstrating the practical impact of language‑grounded reasoning for search‑and‑rescue style tasks, while acknowledging simulation‑based limits and the need for real‑world validation.

Abstract

Integrating language models into robotic exploration frameworks improves performance in unmapped environments by providing the ability to reason over semantic groundings, contextual cues, and temporal states. The proposed method employs large language models (GPT-3.5 and Claude Haiku) to reason over these cues and express that reasoning in terms of natural language, which can be used to inform future states. We are motivated by the context of search-and-rescue applications where efficient exploration is critical. We find that by leveraging natural language, semantics, and tracking temporal states, the proposed method greatly reduces exploration path distance and further exposes the need for environment-dependent heuristics. Moreover, the method is highly robust to a variety of environments and noisy vision detections, as shown with a 100% success rate in a series of comprehensive experiments across three different environments conducted in a custom simulation pipeline operating in Unreal Engine.
Paper Structure (20 sections, 1 equation, 12 figures, 1 table)

This paper contains 20 sections, 1 equation, 12 figures, 1 table.

Figures (12)

  • Figure 1: Spot characterizing its environment through its VQA model (Language Priors), searching for specific objects with its object detection model and creating projections (Object Points shown in red) surrounded by a set of navigable graph points (shown in purple).
  • Figure 2: Renderings from Unreal Engine Environments
  • Figure 3: CogExplore System Diagarm
  • Figure 4: Example Runs Demonstrating Varieties of Reasoning
  • Figure 5: Path length comparisons for each method (CE-3.5, CE-H, VEFEP) on completing each of the seven tasks. Black line for each whisker plot is the mean.
  • ...and 7 more figures