Table of Contents
Fetching ...

HANDO: Hierarchical Autonomous Navigation and Dexterous Omni-loco-manipulation

Jingyuan Sun, Chaoran Wang, Mingyu Zhang, Cui Miao, Hongyu Ji, Zihan Qu, Han Sun, Bing Wang, Qingyi Si

TL;DR

The paper tackles last-mile delivery with humans by unifying map-free navigation and dexterous loco-manipulation on a legged robot with an onboard arm. It introduces HANDO, a two-layer framework where the top layer performs goal-conditioned autonomous exploration toward semantic targets, and the bottom layer coordinates the arm and legs for interaction, guided by a Hand-Track trajectory generator. The navigation policy relies on a vision-language grounding approach with graph matching, operating through a three-stage process guided by the score $s_t$ and thresholds $\sigma_1$ and $\sigma_2$, while the loco-manipulation policy is trained as a PPO-based POMDP with a PD-closed-loop controller and domain randomization. Real-world experiments in unstructured environments demonstrate end-to-end map-free navigation and human-centered handover capabilities, illustrating the potential for scalable, robust delivery in dynamic settings.

Abstract

Seamless loco-manipulation in unstructured environments requires robots to leverage autonomous exploration alongside whole-body control for physical interaction. In this work, we introduce HANDO (Hierarchical Autonomous Navigation and Dexterous Omni-loco-manipulation), a two-layer framework designed for legged robots equipped with manipulators to perform human-centered mobile manipulation tasks. The first layer utilizes a goal-conditioned autonomous exploration policy to guide the robot to semantically specified targets, such as a black office chair in a dynamic environment. The second layer employs a unified whole-body loco-manipulation policy to coordinate the arm and legs for precise interaction tasks-for example, handing a drink to a person seated on the chair. We have conducted an initial deployment of the navigation module, and will continue to pursue finer-grained deployment of whole-body loco-manipulation.

HANDO: Hierarchical Autonomous Navigation and Dexterous Omni-loco-manipulation

TL;DR

The paper tackles last-mile delivery with humans by unifying map-free navigation and dexterous loco-manipulation on a legged robot with an onboard arm. It introduces HANDO, a two-layer framework where the top layer performs goal-conditioned autonomous exploration toward semantic targets, and the bottom layer coordinates the arm and legs for interaction, guided by a Hand-Track trajectory generator. The navigation policy relies on a vision-language grounding approach with graph matching, operating through a three-stage process guided by the score and thresholds and , while the loco-manipulation policy is trained as a PPO-based POMDP with a PD-closed-loop controller and domain randomization. Real-world experiments in unstructured environments demonstrate end-to-end map-free navigation and human-centered handover capabilities, illustrating the potential for scalable, robust delivery in dynamic settings.

Abstract

Seamless loco-manipulation in unstructured environments requires robots to leverage autonomous exploration alongside whole-body control for physical interaction. In this work, we introduce HANDO (Hierarchical Autonomous Navigation and Dexterous Omni-loco-manipulation), a two-layer framework designed for legged robots equipped with manipulators to perform human-centered mobile manipulation tasks. The first layer utilizes a goal-conditioned autonomous exploration policy to guide the robot to semantically specified targets, such as a black office chair in a dynamic environment. The second layer employs a unified whole-body loco-manipulation policy to coordinate the arm and legs for precise interaction tasks-for example, handing a drink to a person seated on the chair. We have conducted an initial deployment of the navigation module, and will continue to pursue finer-grained deployment of whole-body loco-manipulation.

Paper Structure

This paper contains 14 sections, 6 equations, 2 figures.

Figures (2)

  • Figure 1: Overview of HANDO. The two-layer framework couples mapless navigation (Layer 1) with whole-body loco-manipulation (Layer 2), where navigation outputs velocity/joint commands and manipulation uses hand-track with diffusion policy to generate coordinated grasping and handover.
  • Figure 2: Snapshots of real-world experiments.The task required the robot to deliver a beverage and handle to a seated human.