Stage-Wise Reward Shaping for Acrobatic Robots: A Constrained Multi-Objective Reinforcement Learning Approach

Dohyeong Kim; Hyeokjin Kwon; Junseok Kim; Gunmin Lee; Songhwai Oh

Stage-Wise Reward Shaping for Acrobatic Robots: A Constrained Multi-Objective Reinforcement Learning Approach

Dohyeong Kim, Hyeokjin Kwon, Junseok Kim, Gunmin Lee, Songhwai Oh

TL;DR

This work introduces an RL method aimed at simplifying the reward-shaping process through intuitive strategies and introduces a practical CMORL algorithm that maximizes objectives based on these rewards while satisfying constraints defined by the costs.

Abstract

As the complexity of tasks addressed through reinforcement learning (RL) increases, the definition of reward functions also has become highly complicated. We introduce an RL method aimed at simplifying the reward-shaping process through intuitive strategies. Initially, instead of a single reward function composed of various terms, we define multiple reward and cost functions within a constrained multi-objective RL (CMORL) framework. For tasks involving sequential complex movements, we segment the task into distinct stages and define multiple rewards and costs for each stage. Finally, we introduce a practical CMORL algorithm that maximizes objectives based on these rewards while satisfying constraints defined by the costs. The proposed method has been successfully demonstrated across a variety of acrobatic tasks in both simulation and real-world environments. Additionally, it has been shown to successfully perform tasks compared to existing RL and constrained RL algorithms. Our code is available at https://github.com/rllab-snu/Stage-Wise-CMORL.

Stage-Wise Reward Shaping for Acrobatic Robots: A Constrained Multi-Objective Reinforcement Learning Approach

TL;DR

Abstract

Paper Structure (23 sections, 4 equations, 6 figures, 1 table)

This paper contains 23 sections, 4 equations, 6 figures, 1 table.

INTRODUCTION
RELATED WORK
Constrained Multi-Objective Reinforcement Learning
Reinforcement Learning for Legged Robots
BACKGROUND
Constrained Multi-Objective Markov Decision Processes
CMORL Problem Setup
PROPOSED METHOD
Stage-Wise Reward Shaping
Stage Transitions
Reward and Cost Functions
CMORL Policy Update
Sim-to-Real Techniques
EXPERIMENTS
Environmental Setup
...and 8 more sections

Figures (6)

Figure 2: Overview of the proposed framework. An environment provides multiple rewards and costs, and critics compute value estimates for each reward and cost. These estimates are aggregated to calculate the overall advantage, as detailed in Sec. \ref{['sec: policy update']}, which is subsequently used for policy updates. Additionally, a stage scheduler updates the current stage based on a user-defined rule for stage transitions.
Figure 3: Example of stage transitions for the back-flip. The task start in the stand stage. Upon receiving a back-flip command, the robot attempts to sit. When the base height drops below $0.25\; m$, it transitions to the jump stage. During this stage, the robot attempts to jump, transitioning to the air stage once all feet detach from the ground. After completing the aerial motion, the robot transitions to the landing stage as soon as at least one foot makes contact with the ground.
Figure 4: Snapshots of motion sequences generated by trained policies. The first two rows show the Unitree Go1 robot performing side-roll and two-hand walk tasks in the real-world environment. The remaining two rows illustrate the Go1 robot performing the side-flip task and the H1 robot performing the back-flip task in simulation.
Figure 5: Simulation experiment results. The first two rows show the changes in height and body orientation, while the last indicates whether the specified parts of the robot are in contact with the ground (black) or not (white) in each task.
Figure 6: Sim-to-real experimental results. The graph shows the position changes of three joints—hip, thigh, and knee—in the left front leg, along with the body orientation over time for each task. The solid line represents the real-world data, while the dotted line indicates the simulation results.
...and 1 more figures

Stage-Wise Reward Shaping for Acrobatic Robots: A Constrained Multi-Objective Reinforcement Learning Approach

TL;DR

Abstract

Stage-Wise Reward Shaping for Acrobatic Robots: A Constrained Multi-Objective Reinforcement Learning Approach

Authors

TL;DR

Abstract

Table of Contents

Figures (6)