Table of Contents
Fetching ...

A Physical Imitation Learning Pipeline for Energy-Efficient Quadruped Locomotion Assisted by Parallel Elastic Joint

Huyue Ma, Yurui Jin, Helmut Hauser, Rui Wu

Abstract

Due to brain-body co-evolution, animals' intrinsic body dynamics play a crucial role in energy-efficient locomotion, which shares control effort between active muscles and passive body dynamics -- a principle known as Embodied Physical Intelligence. In contrast, robot bodies are often designed with one centralised controller that typically suppress the intrinsic body dynamics instead of exploiting it. We introduce Physical Imitation Learning (PIL), which distils a Reinforcement Learning (RL) control policy into physically implementable body responses that can be directly offloaded to passive Parallel Elastic Joints (PEJs), enabling therefore the body to imitate part of the controlled behaviour. Meanwhile, the residual policy commands the motors to recover the RL policy's performance. The results is an overall reduced energy consumption thanks to outsourcing parts of the control policy to the PEJs. Here we show in simulated quadrupeds, that our PIL approach can offloads up to 87% of mechanical power to PEJs on flat terrain and 18% on rough terrain. Because the body design is distilled from -- rather than jointly optimised with -- the control policy, PIL realises brain-body co-design without expanding the search space with body design parameters, providing a computationally efficient route to task-specific Embodied Physical Intelligence applicable to a wide range of joint-based robot morphologies.

A Physical Imitation Learning Pipeline for Energy-Efficient Quadruped Locomotion Assisted by Parallel Elastic Joint

Abstract

Due to brain-body co-evolution, animals' intrinsic body dynamics play a crucial role in energy-efficient locomotion, which shares control effort between active muscles and passive body dynamics -- a principle known as Embodied Physical Intelligence. In contrast, robot bodies are often designed with one centralised controller that typically suppress the intrinsic body dynamics instead of exploiting it. We introduce Physical Imitation Learning (PIL), which distils a Reinforcement Learning (RL) control policy into physically implementable body responses that can be directly offloaded to passive Parallel Elastic Joints (PEJs), enabling therefore the body to imitate part of the controlled behaviour. Meanwhile, the residual policy commands the motors to recover the RL policy's performance. The results is an overall reduced energy consumption thanks to outsourcing parts of the control policy to the PEJs. Here we show in simulated quadrupeds, that our PIL approach can offloads up to 87% of mechanical power to PEJs on flat terrain and 18% on rough terrain. Because the body design is distilled from -- rather than jointly optimised with -- the control policy, PIL realises brain-body co-design without expanding the search space with body design parameters, providing a computationally efficient route to task-specific Embodied Physical Intelligence applicable to a wide range of joint-based robot morphologies.

Paper Structure

This paper contains 13 sections, 11 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: The Physical Imitation Learning pipeline. Left: the parent control policy is trained via RL with a Cost-of-Transport-augmented reward. Right: actuator torque-angle trajectories recorded from each joint are distilled into nonlinear Parallel Elastic Joint (PEJ) characteristic curves. The residual torque after PEJ subtraction defines the active policy for motor commands. The PEJs can be physically realised using the Elastic Rolling Cam principle, and are implemented similar to passive exoskeleton.
  • Figure 2: Power decomposition of the co-design policy across terrain levels. The solid line shows total power consumption of the co-design policy. The blue and red shaded areas decompose this into the PEJ contribution and the active motor power, respectively. The dashed line shows the baseline policy's power consumption for comparison.
  • Figure 3: Power offload ratio and power saving versus terrain level. The offload ratio compares the co-design policy with and without PEJ; the power saving compares the co-design policy (with PEJ) against the baseline (CoT only, no PEJ).
  • Figure 4: RMS velocity tracking error heatmap. Each cell shows the tracking error when deploying a given policy (row) on a given terrain level (column). Baseline denotes the baseline branch (CoT only, no PEJ); Policy L$x$ denotes the co-design policy trained at Level $x$. N/A indicates the policy cannot sustain locomotion on that terrain.
  • Figure 5: Distilled PEJ curves (dashed lines) for the four symmetric joint groups (columns), overlaid on motor torque--angle samples (dots) recorded from the parent policy during locomotion. (a)(b): co-design and baseline policies on Level 0 with 50,000 samples each. (c)(d): same comparison on Level 6. Tighter sample clustering around the curve indicates a more periodic gait and greater passive offloading. (e): co-design PEJ curves overlaid for Levels 0, 2, 3, 4, and 6 (darker shading indicates higher difficulty).
  • ...and 1 more figures