Table of Contents
Fetching ...

CoTaP: Compliant Task Pipeline and Reinforcement Learning of Its Controller with Compliance Modulation

Zewen He, Chenyuan Chen, Dilshod Azizov, Yoshihiko Nakamura

TL;DR

The paper tackles the challenge of enabling compliant loco-manipulation for humanoid robots by integrating model-based impedance control with reinforcement learning. It introduces CoTaP, a two-stage dual-agent RL pipeline that includes SPD-manifold based task-space compliance modulation and a distillation-based transfer to a compliant upper-body policy, ensuring stability and online adjustability. Key innovations include decoupled upper-body compliance, Log-Euclidean stiffness interpolation, and a KL-based distillation objective that preserves coordination with the lower body. Across simulations on Unitree H1 (Isaac Gym) and MuJoCo, CoTaP demonstrates adjustable compliance, improved disturbance rejection, and balanced torso and joint torques, indicating strong potential for real-world humanoid loco-manipulation.

Abstract

Humanoid whole-body locomotion control is a critical approach for humanoid robots to leverage their inherent advantages. Learning-based control methods derived from retargeted human motion data provide an effective means of addressing this issue. However, because most current human datasets lack measured force data, and learning-based robot control is largely position-based, achieving appropriate compliance during interaction with real environments remains challenging. This paper presents Compliant Task Pipeline (CoTaP): a pipeline that leverages compliance information in the learning-based structure of humanoid robots. A two-stage dual-agent reinforcement learning framework combined with model-based compliance control for humanoid robots is proposed. In the training process, first a base policy with a position-based controller is trained; then in the distillation, the upper-body policy is combined with model-based compliance control, and the lower-body agent is guided by the base policy. In the upper-body control, adjustable task-space compliance can be specified and integrated with other controllers through compliance modulation on the symmetric positive definite (SPD) manifold, ensuring system stability. We validated the feasibility of the proposed strategy in simulation, primarily comparing the responses to external disturbances under different compliance settings.

CoTaP: Compliant Task Pipeline and Reinforcement Learning of Its Controller with Compliance Modulation

TL;DR

The paper tackles the challenge of enabling compliant loco-manipulation for humanoid robots by integrating model-based impedance control with reinforcement learning. It introduces CoTaP, a two-stage dual-agent RL pipeline that includes SPD-manifold based task-space compliance modulation and a distillation-based transfer to a compliant upper-body policy, ensuring stability and online adjustability. Key innovations include decoupled upper-body compliance, Log-Euclidean stiffness interpolation, and a KL-based distillation objective that preserves coordination with the lower body. Across simulations on Unitree H1 (Isaac Gym) and MuJoCo, CoTaP demonstrates adjustable compliance, improved disturbance rejection, and balanced torso and joint torques, indicating strong potential for real-world humanoid loco-manipulation.

Abstract

Humanoid whole-body locomotion control is a critical approach for humanoid robots to leverage their inherent advantages. Learning-based control methods derived from retargeted human motion data provide an effective means of addressing this issue. However, because most current human datasets lack measured force data, and learning-based robot control is largely position-based, achieving appropriate compliance during interaction with real environments remains challenging. This paper presents Compliant Task Pipeline (CoTaP): a pipeline that leverages compliance information in the learning-based structure of humanoid robots. A two-stage dual-agent reinforcement learning framework combined with model-based compliance control for humanoid robots is proposed. In the training process, first a base policy with a position-based controller is trained; then in the distillation, the upper-body policy is combined with model-based compliance control, and the lower-body agent is guided by the base policy. In the upper-body control, adjustable task-space compliance can be specified and integrated with other controllers through compliance modulation on the symmetric positive definite (SPD) manifold, ensuring system stability. We validated the feasibility of the proposed strategy in simulation, primarily comparing the responses to external disturbances under different compliance settings.

Paper Structure

This paper contains 23 sections, 16 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: (a) Simulation of H1 under a vertical load in the low-stiffness condition. (b) Simulation of H1 under a vertical load in the high-stiffness condition. (c) Illustration of the stiffness matrices modulation on SPD manifold. Two different original stiffness matrices are first mapped to the Log-Euclidean space using the log mapping, then linearly interpolated, and finally mapped back using the exp mapping.
  • Figure 2: Overview of CoTaP pipeline. The red frame highlights the method proposed in this paper, and our objective is to implement the entire pipeline on humanoid robots.
  • Figure 3: Overview of the training framework in this study.
  • Figure 4: Left hand position error in the $z$-axis under a constant $-50\,\mathrm{N}$ payload (applied after $1.0\,\mathrm{s}$). Stiffness values are for the $z$-axis; $x,y$ are set to $300\,\mathrm{N/m}$. Position error unit: m.
  • Figure 5: Screenshots of walking control under an external impact on left hand. The upper-body reference motion is punching. The red line represents the external impact (500 N in 0.05 s). The blue balls are reference points of hands and elbows.
  • ...and 4 more figures