RobotKeyframing: Learning Locomotion with High-Level Objectives via Mixture of Dense and Sparse Rewards

Fatemeh Zargarbashi; Jin Cheng; Dongho Kang; Robert Sumner; Stelian Coros

RobotKeyframing: Learning Locomotion with High-Level Objectives via Mixture of Dense and Sparse Rewards

Fatemeh Zargarbashi, Jin Cheng, Dongho Kang, Robert Sumner, Stelian Coros

TL;DR

A novel learning-based control framework that uses keyframing to incorporate high-level objectives in natural locomotion for legged robots that significantly reduces the effort of hyperparameter tuning compared to the standard single-critic alternative.

Abstract

This paper presents a novel learning-based control framework that uses keyframing to incorporate high-level objectives in natural locomotion for legged robots. These high-level objectives are specified as a variable number of partial or complete pose targets that are spaced arbitrarily in time. Our proposed framework utilizes a multi-critic reinforcement learning algorithm to effectively handle the mixture of dense and sparse rewards. Additionally, it employs a transformer-based encoder to accommodate a variable number of input targets, each associated with specific time-to-arrivals. Throughout simulation and hardware experiments, we demonstrate that our framework can effectively satisfy the target keyframe sequence at the required times. In the experiments, the multi-critic method significantly reduces the effort of hyperparameter tuning compared to the standard single-critic alternative. Moreover, the proposed transformer-based architecture enables robots to anticipate future goals, which results in quantitative improvements in their ability to reach their targets.

RobotKeyframing: Learning Locomotion with High-Level Objectives via Mixture of Dense and Sparse Rewards

TL;DR

Abstract

Paper Structure (23 sections, 11 equations, 8 figures, 9 tables, 1 algorithm)

This paper contains 23 sections, 11 equations, 8 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Reinforcement Learning for Legged Robots
Natural Motion for Characters and Robots
Method
Problem Setup
Multi-Critic RL for Dense-Sparse Reward Mixture
Transformer-based Keyframe Encoding
Results
Keyframe Matching
Multi-Critic RL
Future Goal Anticipation
Hardware Deployment
Discussion
Appendix
...and 8 more sections

Figures (8)

Figure 1: RobotKeyframing: Locomotion policy trained with our framework meets the keyframes with position and full posture targets (yellow) at specified times on hardware experiments.
Figure 2: Multi-Critic RL.
Figure 3: Policy with transformer-based keyframe encoder.
Figure 4: a) Horizontal trajectories of the robot base given two sets of position goals (dots). b) Specifying different temporal profiles generates diverse behaviors for the same position goal.
Figure 5: Snapshots of the robot motion given keyframes with full postures: moving forward (top), jumping (middle) and raising the paw up (bottom). Target keyframes are displayed in yellow.
...and 3 more figures

RobotKeyframing: Learning Locomotion with High-Level Objectives via Mixture of Dense and Sparse Rewards

TL;DR

Abstract

RobotKeyframing: Learning Locomotion with High-Level Objectives via Mixture of Dense and Sparse Rewards

Authors

TL;DR

Abstract

Table of Contents

Figures (8)