Table of Contents
Fetching ...

Preferenced Oracle Guided Multi-mode Policies for Dynamic Bipedal Loco-Manipulation

Prashanth Ravichandar, Lokesh Krishna, Nikhil Sobanbabu, Quan Nguyen

TL;DR

This work proposes Preferenced Oracle Guided Multi-mode Policies (OGMP) to learn a single policy mastering all the required modes and preferred sequence of transitions to solve uni-object loco-manipulation tasks, and designs hybrid automatons as oracles to generate references with continuous dynamics and discrete mode jumps to perform a guided policy optimization through bounded exploration.

Abstract

Dynamic loco-manipulation calls for effective whole-body control and contact-rich interactions with the object and the environment. Existing learning-based control synthesis relies on training low-level skill policies and explicitly switching with a high-level policy or a hand-designed finite state machine, leading to quasi-static behaviors. In contrast, dynamic tasks such as soccer require the robot to run towards the ball, decelerate to an optimal approach to dribble, and eventually kick a goal - a continuum of smooth motion. To this end, we propose Preferenced Oracle Guided Multi-mode Policies (OGMP) to learn a single policy mastering all the required modes and preferred sequence of transitions to solve uni-object loco-manipulation tasks. We design hybrid automatons as oracles to generate references with continuous dynamics and discrete mode jumps to perform a guided policy optimization through bounded exploration. To enforce learning a desired sequence of mode transitions, we present a task-agnostic preference reward that enhances performance. The proposed approach demonstrates successful loco-manipulation for tasks like soccer and moving boxes omnidirectionally through whole-body control. In soccer, a single policy learns to optimally reach the ball, transition to contact-rich dribbling, and execute successful goal kicks and ball stops. Leveraging the oracle's abstraction, we solve each loco-manipulation task on robots with varying morphologies, including HECTOR V1, Berkeley Humanoid, Unitree G1, and H1, using the same reward definition and weights.

Preferenced Oracle Guided Multi-mode Policies for Dynamic Bipedal Loco-Manipulation

TL;DR

This work proposes Preferenced Oracle Guided Multi-mode Policies (OGMP) to learn a single policy mastering all the required modes and preferred sequence of transitions to solve uni-object loco-manipulation tasks, and designs hybrid automatons as oracles to generate references with continuous dynamics and discrete mode jumps to perform a guided policy optimization through bounded exploration.

Abstract

Dynamic loco-manipulation calls for effective whole-body control and contact-rich interactions with the object and the environment. Existing learning-based control synthesis relies on training low-level skill policies and explicitly switching with a high-level policy or a hand-designed finite state machine, leading to quasi-static behaviors. In contrast, dynamic tasks such as soccer require the robot to run towards the ball, decelerate to an optimal approach to dribble, and eventually kick a goal - a continuum of smooth motion. To this end, we propose Preferenced Oracle Guided Multi-mode Policies (OGMP) to learn a single policy mastering all the required modes and preferred sequence of transitions to solve uni-object loco-manipulation tasks. We design hybrid automatons as oracles to generate references with continuous dynamics and discrete mode jumps to perform a guided policy optimization through bounded exploration. To enforce learning a desired sequence of mode transitions, we present a task-agnostic preference reward that enhances performance. The proposed approach demonstrates successful loco-manipulation for tasks like soccer and moving boxes omnidirectionally through whole-body control. In soccer, a single policy learns to optimally reach the ball, transition to contact-rich dribbling, and execute successful goal kicks and ball stops. Leveraging the oracle's abstraction, we solve each loco-manipulation task on robots with varying morphologies, including HECTOR V1, Berkeley Humanoid, Unitree G1, and H1, using the same reward definition and weights.
Paper Structure (16 sections, 5 equations, 8 figures, 3 tables)

This paper contains 16 sections, 5 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Dynamic contact-rich whole body loco-manipulation tasks on different bipedal and humanoid robots with Preferenced OGMP. Accompanying project website: https://indweller.github.io/ogmplm/
  • Figure 2: A multi-mode oracle designed as a hybrid automaton for the reach-avoid task. Dotted lines show the generated references in reach (green, left) and avoid (red, right) modes.
  • Figure 3: Overview of the proposed framework: the environment dynamically queries the multi-mode oracle for a reference. A bounded exploration around the reference is then performed to learn dynamic loco-manipulation effectively.
  • Figure 4: Keyframes from the simulation: a) First row - moving the box along the slope. b) Second row - moving the box omnidirectionally along the plane c) Third row - dribbling a soccer ball in the soccer-stop task. In each case, the robot learns to manipulate the object to the target location (green circle).
  • Figure 5: Selected CoM states in the soccer-kick task for Berkeley Humanoid. The colored regions indicate the active modes - reach (red), manipulate (blue), kick (green).
  • ...and 3 more figures

Theorems & Definitions (2)

  • Remark 1
  • Remark 2