Control of Humanoid Robots with Parallel Mechanisms using Differential Actuation Models
Victor Lutz, Ludovic de Matteis, Virgile Batto, Nicolas Mansard
TL;DR
The study addresses control and learning for humanoid robots with parallel actuation by introducing a compact, differentiable actuation model that exactly captures the non-linear transmissions of knee and ankle mechanisms. This Actuated Serial model enables efficient first- and second-order derivatives for trajectory optimization and provides an analytical impedance transfer to motor space for reinforcement learning. By integrating the model into full-body trajectory optimization and RL, and validating on hardware, the approach yields higher accuracy and robustness than constant-ratio approximations and supports transferring serial-trained policies to hardware with actuator-space gains. The results demonstrate practical benefits for WB-MPC and RL in parallel-actuated humanoids, reducing computational overhead while expanding feasible motion envelopes and improving real-world performance.
Abstract
Several recently released humanoid robots, inspired by the mechanical design of Cassie, employ actuator configurations in which the motors are displaced from the joints to reduce leg inertia. While studies accounting for the full kinematic complexity have demonstrated the benefits of these designs, the associated loop-closure constraints greatly increase computational cost and limit their use in control and learning. As a result, the non-linear transmission is often approximated by a constant reduction ratio, preventing exploitation of the mechanism's full capabilities. This paper introduces a compact analytical formulation for the two standard knee and ankle mechanisms that captures the exact non-linear transmission while remaining computationally efficient. The model is fully differentiable up to second order with a minimal formulation, enabling low-cost evaluation of dynamic derivatives for trajectory optimization and of the apparent transmission impedance for reinforcement learning. We integrate this formulation into trajectory optimization and locomotion policy learning, and compare it against simplified constant-ratio approaches. Hardware experiments demonstrate improved accuracy and robustness, showing that the proposed method provides a practical means to incorporate parallel actuation into modern control algorithms.
