Table of Contents
Fetching ...

Decision-Driven Semantic Object Exploration for Legged Robots via Confidence-Calibrated Perception and Topological Subgoal Selection

Guoyang Zhao, Yudong Li, Weiqing Qi, Kai Zhang, Bonan Liu, Kai Chen, Haoang Li, Jun Ma

TL;DR

This work proposes a vision-based approach that explicitly addresses decision-driven semantic object exploration through confidence-calibrated semantic evidence arbitration, a controlled-growth semantic topological memory, and a semantic utility-driven subgoal selection mechanism.

Abstract

Conventional navigation pipelines for legged robots remain largely geometry-centric, relying on dense SLAM representations that are fragile under rapid motion and offer limited support for semantic decision making in open-world exploration. In this work, we focus on decision-driven semantic object exploration, where the primary challenge is not map consistency but how noisy and heterogeneous semantic observations can be transformed into stable and executable exploration decisions. We propose a vision-based approach that explicitly addresses this problem through confidence-calibrated semantic evidence arbitration, a controlled-growth semantic topological memory, and a semantic utility-driven subgoal selection mechanism. These components enable the robot to accumulate task-relevant semantic knowledge over time and select exploration targets that balance semantic relevance, reliability, and reachability, without requiring dense geometric reconstruction. Extensive experiments in both simulation and real-world environments demonstrate that the proposed mechanisms consistently improve the quality of semantic decision inputs, subgoal selection accuracy, and overall exploration performance on legged robots.

Decision-Driven Semantic Object Exploration for Legged Robots via Confidence-Calibrated Perception and Topological Subgoal Selection

TL;DR

This work proposes a vision-based approach that explicitly addresses decision-driven semantic object exploration through confidence-calibrated semantic evidence arbitration, a controlled-growth semantic topological memory, and a semantic utility-driven subgoal selection mechanism.

Abstract

Conventional navigation pipelines for legged robots remain largely geometry-centric, relying on dense SLAM representations that are fragile under rapid motion and offer limited support for semantic decision making in open-world exploration. In this work, we focus on decision-driven semantic object exploration, where the primary challenge is not map consistency but how noisy and heterogeneous semantic observations can be transformed into stable and executable exploration decisions. We propose a vision-based approach that explicitly addresses this problem through confidence-calibrated semantic evidence arbitration, a controlled-growth semantic topological memory, and a semantic utility-driven subgoal selection mechanism. These components enable the robot to accumulate task-relevant semantic knowledge over time and select exploration targets that balance semantic relevance, reliability, and reachability, without requiring dense geometric reconstruction. Extensive experiments in both simulation and real-world environments demonstrate that the proposed mechanisms consistently improve the quality of semantic decision inputs, subgoal selection accuracy, and overall exploration performance on legged robots.

Paper Structure

This paper contains 35 sections, 10 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Decision-driven vision-only semantic object exploration for legged robots. (a) Semantic evidence arbitration calibrates scene- and object-level perception to produce decision-robust semantic targets. (b) A semantic topology represents explored locations as nodes enriched with semantic cues. (c) Subgoals are selected via utility-driven decision making rather than simple ranking.
  • Figure 2: Overview of the decision-driven semantic object exploration framework. (a) RGB image $I_t$, depth image $D_t$, instruction $\ell$, and robot state $Z_t$ are used as inputs. (b) Hierarchical perception extracts semantic evidence, which is calibrated and fused to produce a stable target $(P_t, L_t, C_f)$. (c) Explored regions are organized into a semantic topological memory with map construction and maintenance. (d) Candidate nodes are filtered and evaluated to select the next subgoal. (e) Obstacle-aware trajectory tracking and RL-based motion policies enable execution on different robot platforms.
  • Figure 3: Simulation results across five scenes with different robots. We evaluate our exploration framework in Garden, Sidewalk, Road, and two Warehouse scenarios. Experiments are conducted using three different legged robot platforms, demonstrating the cross-platform adaptability.
  • Figure 4: Real-world results across five scenes. Exploration trajectories and semantic topological maps are shown for Office, Showroom, Laboratory, Living Room, and Garden using Unitree Go1.