Table of Contents
Fetching ...

OrionNav: Online Planning for Robot Autonomy with Context-Aware LLM and Open-Vocabulary Semantic Scene Graphs

Venkata Naren Devarakonda, Raktim Gautam Goswami, Ali Umut Kaypak, Naman Patel, Rooholla Khorrambakht, Prashanth Krishnamurthy, Farshad Khorrami

TL;DR

This work presents a novel framework for real-time onboard autonomous navigation in unknown environments that change over time by integrating multi-level abstraction in both perception and planning pipelines, and demonstrates the system's efficacy on a quadruped navigating dynamic environments.

Abstract

Enabling robots to autonomously navigate unknown, complex, dynamic environments and perform diverse tasks remains a fundamental challenge in developing robust autonomous physical agents. These agents must effectively perceive their surroundings while leveraging world knowledge for decision-making. Although recent approaches utilize vision-language and large language models for scene understanding and planning, they often rely on offline processing, offboard compute, make simplifying assumptions about the environment and perception, limiting real-world applicability. We present a novel framework for real-time onboard autonomous navigation in unknown environments that change over time by integrating multi-level abstraction in both perception and planning pipelines. Our system fuses data from multiple onboard sensors for localization and mapping and integrates it with open-vocabulary semantics to generate hierarchical scene graphs from continuously updated semantic object map. The LLM-based planner uses these graphs to create multi-step plans that guide low-level controllers in executing navigation tasks specified in natural language. The system's real-time operation enables the LLM to adjust its plans based on updates to the scene graph and task execution status, ensuring continuous adaptation to new situations or when the current plan cannot accomplish the task, a key advantage over static or rule-based systems. We demonstrate our system's efficacy on a quadruped navigating dynamic environments, showcasing its adaptability and robustness in diverse scenarios.

OrionNav: Online Planning for Robot Autonomy with Context-Aware LLM and Open-Vocabulary Semantic Scene Graphs

TL;DR

This work presents a novel framework for real-time onboard autonomous navigation in unknown environments that change over time by integrating multi-level abstraction in both perception and planning pipelines, and demonstrates the system's efficacy on a quadruped navigating dynamic environments.

Abstract

Enabling robots to autonomously navigate unknown, complex, dynamic environments and perform diverse tasks remains a fundamental challenge in developing robust autonomous physical agents. These agents must effectively perceive their surroundings while leveraging world knowledge for decision-making. Although recent approaches utilize vision-language and large language models for scene understanding and planning, they often rely on offline processing, offboard compute, make simplifying assumptions about the environment and perception, limiting real-world applicability. We present a novel framework for real-time onboard autonomous navigation in unknown environments that change over time by integrating multi-level abstraction in both perception and planning pipelines. Our system fuses data from multiple onboard sensors for localization and mapping and integrates it with open-vocabulary semantics to generate hierarchical scene graphs from continuously updated semantic object map. The LLM-based planner uses these graphs to create multi-step plans that guide low-level controllers in executing navigation tasks specified in natural language. The system's real-time operation enables the LLM to adjust its plans based on updates to the scene graph and task execution status, ensuring continuous adaptation to new situations or when the current plan cannot accomplish the task, a key advantage over static or rule-based systems. We demonstrate our system's efficacy on a quadruped navigating dynamic environments, showcasing its adaptability and robustness in diverse scenarios.
Paper Structure (29 sections, 13 figures, 2 tables, 1 algorithm)

This paper contains 29 sections, 13 figures, 2 tables, 1 algorithm.

Figures (13)

  • Figure 1: Overview of our OrionNav framework. The OrionNav system fuses data from onboard LiDAR and odometry sensors for robust localization and mapping, while integrating open-world semantics to produce a semantic object map of the environment. This map is then clustered into distinct rooms, and room labels are assigned using the Llama3 LLM, generating a hierarchical scene graph. An LLM-based planner utilizes this scene graph, along with user commands, to create a high-level task execution plan, guiding low-level controllers to safely and efficiently achieve designated goals.
  • Figure 2: Our method makes use of the semantic constructs of indoor environments to generate a hierarchical scene graph from the semantic object map. The objects are then clustered based on the density of the objects in the scene. An LLM is then queried with the labels of the objects in each cluster and a set of candidate room labels. Finally, the generated hierarchical graph is passed to the LLM-based planner.
  • Figure 3: Prompt templates used to generate text embeddings for each category. Each instance of '< label>' is replaced with corresponding category names.
  • Figure 4: Overview of the system and user prompts for the LLM planner. The system prompt explains the agent's role, action primitives, and map format. The user prompt provides the map, command history, feedback, and task details. In the initial call to the LLM, command history and feedback are absent, as there is no prior interaction. Feedback includes task status and error messages from previous command executions. '<text>’ represents a placeholder for the information summarized in text.
  • Figure 5: Robot Setup: OrionNav's capabilities are demonstrated on a Unitree Go2 quadrupedal robot equipped with onboard LiDAR sensor, stereo camera, and embedded computers equipped with a graphics processing unit (GPU).
  • ...and 8 more figures