Table of Contents
Fetching ...

ATLAS Navigator: Active Task-driven LAnguage-embedded Gaussian Splatting

Dexter Ong, Yuezhan Tao, Varun Murali, Igor Spasojevic, Vijay Kumar, Pratik Chaudhari

TL;DR

The paper tackles task-directed navigation in unknown, unstructured environments by introducing a hierarchical, language-embedded Gaussian splatting map that jointly yields sparse semantic planning and dense geometric representation for collision-free navigation. It couples bottom-up mapping with language embeddings to create a memory-efficient, submap-based structure and a top-down two-stage planner: a discrete planner selects high-utility vantage points, while a continuous planner generates dynamically feasible trajectories under collision constraints. Task specifications and completions are driven by natural-language prompts and vision-language models, enabling open-set, re-specifiable objectives and termination checks. Real-world indoor and outdoor experiments demonstrate large-scale map construction with over a million Gaussians, competitive performance against privileged baselines, and robust open-vocabulary semantic retrieval, highlighting practical applicability for scalable, language-guided navigation.

Abstract

We address the challenge of task-oriented navigation in unstructured and unknown environments, where robots must incrementally build and reason on rich, metric-semantic maps in real time. Since tasks may require clarification or re-specification, it is necessary for the information in the map to be rich enough to enable generalization across a wide range of tasks. To effectively execute tasks specified in natural language, we propose a hierarchical representation built on language-embedded Gaussian splatting that enables both sparse semantic planning that lends itself to online operation and dense geometric representation for collision-free navigation. We validate the effectiveness of our method through real-world robot experiments conducted in both cluttered indoor and kilometer-scale outdoor environments, with a competitive ratio of about 60% against privileged baselines. Experiment videos and more details can be found on our project page: https://atlasnav.github.io

ATLAS Navigator: Active Task-driven LAnguage-embedded Gaussian Splatting

TL;DR

The paper tackles task-directed navigation in unknown, unstructured environments by introducing a hierarchical, language-embedded Gaussian splatting map that jointly yields sparse semantic planning and dense geometric representation for collision-free navigation. It couples bottom-up mapping with language embeddings to create a memory-efficient, submap-based structure and a top-down two-stage planner: a discrete planner selects high-utility vantage points, while a continuous planner generates dynamically feasible trajectories under collision constraints. Task specifications and completions are driven by natural-language prompts and vision-language models, enabling open-set, re-specifiable objectives and termination checks. Real-world indoor and outdoor experiments demonstrate large-scale map construction with over a million Gaussians, competitive performance against privileged baselines, and robust open-vocabulary semantic retrieval, highlighting practical applicability for scalable, language-guided navigation.

Abstract

We address the challenge of task-oriented navigation in unstructured and unknown environments, where robots must incrementally build and reason on rich, metric-semantic maps in real time. Since tasks may require clarification or re-specification, it is necessary for the information in the map to be rich enough to enable generalization across a wide range of tasks. To effectively execute tasks specified in natural language, we propose a hierarchical representation built on language-embedded Gaussian splatting that enables both sparse semantic planning that lends itself to online operation and dense geometric representation for collision-free navigation. We validate the effectiveness of our method through real-world robot experiments conducted in both cluttered indoor and kilometer-scale outdoor environments, with a competitive ratio of about 60% against privileged baselines. Experiment videos and more details can be found on our project page: https://atlasnav.github.io

Paper Structure

This paper contains 27 sections, 27 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Our framework consists of three components. The front-end processing [A] extracts and compresses dense pixel-level language features from the image. The module also clusters features based on geometry and semantics in the map. The hierarchical mapper [B] runs bottom-up, ingesting the RGB and depth images and the odometric path from the robot to build a map. The top level of the map contains the submaps, the middle level the regions, and the bottom level the objects. The local map compsises the loaded submaps. The other submaps are unloaded to save memory (shown here in gray). The planning module [C] consists of a discrete planner that operates on the sparse map and generates a reference path, while the dense Gaussians in the local map are used to find the trajectory to be executed on the robot.
  • Figure 2: An illustration of the different parameters that are relevant to the submapping and collision checking process. $X_s$ denotes the current position of the robot, $X_l$ is the local goal along the path to the final goal $X_{goal}$. Submaps are loaded within the region bounded by $R_{loc}$.
  • Figure 3: Qualitative results showing the output of the VLM when the task terminates.
  • Figure 4: [A] shows the task provided to our method. [B] shows the selected submap and region in the bottom-left with highest relevance. [C] shows the rendered image from the vantage point with the highest relevance to the task. [D] shows the ground truth image.
  • Figure 5: The outdoor experiment areas for our experiments. Our park experiments are in the highlighted yellow areas. The parking lots are in red and blue.
  • ...and 3 more figures