Table of Contents
Fetching ...

Hierarchical Planning and Control for Box Loco-Manipulation

Zhaoming Xie, Jonathan Tseng, Sebastian Starke, Michiel van de Panne, C. Karen Liu

TL;DR

This paper addresses the problem of enabling a physics-based humanoid to perform box rearrangement in cluttered environments by integrating locomotion and manipulation within a four-level hierarchical stack. A kinodynamic planner provides waypoint constraints, a diffusion-based mid-level motion generator yields realistic whole-body trajectories, and imitation-based RL policies execute the low-level motor skills, including an object-aware manipulation policy. The key contributions are a hierarchical planning-and-control framework, the use of diffusion models with bidirectional root control for robust locomotion planning, and demonstrated generalization of a single pick-up/put-down motion to objects of varying weights and heights. The approach yields robust, scalable loco-manipulation capabilities with practical implications for virtual humans and robotics, while outlining concrete avenues for improvement such as unified models and dynamic replanning.

Abstract

Humans perform everyday tasks using a combination of locomotion and manipulation skills. Building a system that can handle both skills is essential to creating virtual humans. We present a physically-simulated human capable of solving box rearrangement tasks, which requires a combination of both skills. We propose a hierarchical control architecture, where each level solves the task at a different level of abstraction, and the result is a physics-based simulated virtual human capable of rearranging boxes in a cluttered environment. The control architecture integrates a planner, diffusion models, and physics-based motion imitation of sparse motion clips using deep reinforcement learning. Boxes can vary in size, weight, shape, and placement height. Code and trained control policies are provided.

Hierarchical Planning and Control for Box Loco-Manipulation

TL;DR

This paper addresses the problem of enabling a physics-based humanoid to perform box rearrangement in cluttered environments by integrating locomotion and manipulation within a four-level hierarchical stack. A kinodynamic planner provides waypoint constraints, a diffusion-based mid-level motion generator yields realistic whole-body trajectories, and imitation-based RL policies execute the low-level motor skills, including an object-aware manipulation policy. The key contributions are a hierarchical planning-and-control framework, the use of diffusion models with bidirectional root control for robust locomotion planning, and demonstrated generalization of a single pick-up/put-down motion to objects of varying weights and heights. The approach yields robust, scalable loco-manipulation capabilities with practical implications for virtual humans and robotics, while outlining concrete avenues for improvement such as unified models and dynamic replanning.

Abstract

Humans perform everyday tasks using a combination of locomotion and manipulation skills. Building a system that can handle both skills is essential to creating virtual humans. We present a physically-simulated human capable of solving box rearrangement tasks, which requires a combination of both skills. We propose a hierarchical control architecture, where each level solves the task at a different level of abstraction, and the result is a physics-based simulated virtual human capable of rearranging boxes in a cluttered environment. The control architecture integrates a planner, diffusion models, and physics-based motion imitation of sparse motion clips using deep reinforcement learning. Boxes can vary in size, weight, shape, and placement height. Code and trained control policies are provided.
Paper Structure (31 sections, 8 equations, 10 figures, 3 tables)

This paper contains 31 sections, 8 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: System overview. We design four motion primitives for locomotion and manipulation which can be combined to accomplish arrangement tasks in a 3D scene according to a simple motion primitive graph. Each primitive consists of a policy that controls the character in a physically simulated environment. The locomotion primitives, "Walk-only" and "Walk-and-carry", leverage a diffusion model to generate kinematic reference trajectories that guide the policies. Components with the same color are identical.
  • Figure 2: The two locomotion system primitives: walk-only (top) and walk-and-carry (bottom).
  • Figure 3: The diffusion model architecture.
  • Figure 4: Enforcing continuity between short sequences is sufficient to create globally smooth motions.
  • Figure 5: Reference motion for the pick up and put down motion.
  • ...and 5 more figures