I-CTRL: Imitation to Control Humanoid Robots Through Constrained Reinforcement Learning

Yashuai Yan; Esteve Valls Mascaro; Tobias Egle; Dongheui Lee

I-CTRL: Imitation to Control Humanoid Robots Through Constrained Reinforcement Learning

Yashuai Yan, Esteve Valls Mascaro, Tobias Egle, Dongheui Lee

TL;DR

Humanoid motion imitation is challenged by the need to balance visual fidelity with physics-based feasibility. The authors formulate motion imitation as a constrained refinement over retargeted human motions using bounded residual reinforcement learning within a CMDP, enforcing $D(T(s_t, \pi(s_t)), f(h_{t+1})) < \delta$ while preserving motion style. A single policy is trained across five robots using domain randomization and an Automatic Priority Scheduler to leverage a large Human MoCap dataset (≈9K motions) in about 15 hours, without per-robot reward tuning. The approach yields robust, physics-consistent imitation across diverse humanoids, demonstrating substantial improvements in success rate and motion style preservation and enabling scalable deployment of humanoid robots in varied tasks.

Abstract

Humanoid robots have the potential to mimic human motions with high visual fidelity, yet translating these motions into practical, physical execution remains a significant challenge. Existing techniques in the graphics community often prioritize visual fidelity over physics-based feasibility, posing a significant challenge for deploying bipedal systems in practical applications. This paper addresses these issues through bounded residual reinforcement learning to produce physics-based high-quality motion imitation onto legged humanoid robots that enhance motion resemblance while successfully following the reference human trajectory. Our framework, Imitation to Control Humanoid Robots Through Bounded Residual Reinforcement Learning (I-CTRL), reformulates motion imitation as a constrained refinement over non-physics-based retargeted motions. I-CTRL excels in motion imitation with simple and unique rewards that generalize across five robots. Moreover, our framework introduces an automatic priority scheduler to manage large-scale motion datasets when efficiently training a unified RL policy across diverse motions. The proposed approach signifies a crucial step forward in advancing the control of bipedal robots, emphasizing the importance of aligning visual and physical realism for successful motion imitation.

I-CTRL: Imitation to Control Humanoid Robots Through Constrained Reinforcement Learning

TL;DR

while preserving motion style. A single policy is trained across five robots using domain randomization and an Automatic Priority Scheduler to leverage a large Human MoCap dataset (≈9K motions) in about 15 hours, without per-robot reward tuning. The approach yields robust, physics-consistent imitation across diverse humanoids, demonstrating substantial improvements in success rate and motion style preservation and enabling scalable deployment of humanoid robots in varied tasks.

Abstract

Paper Structure (20 sections, 1 equation, 5 figures, 7 tables, 1 algorithm)

This paper contains 20 sections, 1 equation, 5 figures, 7 tables, 1 algorithm.

Introduction
Methodology
Problem formulation
Human-to-robot pose retargeting
Bounded residual reinforcement learning
Rewards design
Domain randomization
Automatic priority scheduler
Experiments
Human MoCap dataset
Humanoid robots
Experimental setting
Metrics
Evaluation
Ablation Study
...and 5 more sections

Figures (5)

Figure 1: Various bipedal humanoid robots imitate dynamic human motions through a reinforcement learning model, named I-CTRL. Each block shows diverse behaviors for the G1, JVRC-1, ATLAS, BRUCE, and H1 robots.
Figure 2: Architecture overview of our I-CTRL framework. First, we train ImitationNet imitationnet to retarget human motions into different robots. ImitationNet provides stylistic robot poses $\mathbf{q}^{ref}_{1:T}$ that resemble the original human but are not feasible for executing in the physical world. We then use our I-CTRL framework to train a whole-body controller that refines the reference motions in a physics simulator. I-CTRL introduces additional constraints to a standard residual reinforcement learning process. These constraints ensure the physical plausibility of various human motions, while significantly and effectively reducing the exploration space for reinforcement learning, allowing it to train a control policy across thousands of dynamic human movements within hours.
Figure 3: Quantitative evaluation of DeepMimic deepmimic and AMP amp versus our proposed approach, with and without constraints.
Figure 4: Comparison of the performance of DeepMimic deepmimic and AMP amp versus our approach. Here, the reference refers to the retargeted robot motion from a real human walking using ImitationNet imitationnet. The reference robot (green) ignores the physics laws for the visualization purpose.
Figure 5: Control of different bipedal humanoid robots for various motions We showcase the results of our proposed method, I-CTRL, imitating various dynamic human motions described by the given text. The simulation is conducted with domain randomization and on uneven terrains.

I-CTRL: Imitation to Control Humanoid Robots Through Constrained Reinforcement Learning

TL;DR

Abstract

I-CTRL: Imitation to Control Humanoid Robots Through Constrained Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)