Adaptive Tracking of a Single-Rigid-Body Character in Various Environments
Taesoo Kwon, Taehong Gu, Jaewon Ahn, Yoonsang Lee
TL;DR
The paper addresses the challenge of enabling robust locomotion adaptation for simulated characters across unseen environments. It introduces a centroidal-dynamics single-rigid-body (SRB) model and trains a reinforcement learning policy to track a reference motion, achieving rapid, sample-efficient learning. At runtime, the SRB motion is transformed into plausible full-body motion via a precomputed delta and momentum-mapped inverse kinematics, allowing policy switching and blending without additional learning. The approach delivers competitive adaptability to uneven terrain and external pushes with significantly reduced training time compared to full-body DRL methods like DeepMimic, highlighting a practical pathway for efficient, environment-agnostic character control.
Abstract
Since the introduction of DeepMimic [Peng et al. 2018], subsequent research has focused on expanding the repertoire of simulated motions across various scenarios. In this study, we propose an alternative approach for this goal, a deep reinforcement learning method based on the simulation of a single-rigid-body character. Using the centroidal dynamics model (CDM) to express the full-body character as a single rigid body (SRB) and training a policy to track a reference motion, we can obtain a policy that is capable of adapting to various unobserved environmental changes and controller transitions without requiring any additional learning. Due to the reduced dimension of state and action space, the learning process is sample-efficient. The final full-body motion is kinematically generated in a physically plausible way, based on the state of the simulated SRB character. The SRB simulation is formulated as a quadratic programming (QP) problem, and the policy outputs an action that allows the SRB character to follow the reference motion. We demonstrate that our policy, efficiently trained within 30 minutes on an ultraportable laptop, has the ability to cope with environments that have not been experienced during learning, such as running on uneven terrain or pushing a box, and transitions between learned policies, without any additional learning.
