Table of Contents
Fetching ...

SATA: Safe and Adaptive Torque-Based Locomotion Policies Inspired by Animal Learning

Peizhuo Li, Hongyi Li, Ge Sun, Jin Cheng, Xinrong Yang, Guillaume Bellegarda, Milad Shafiee, Yuhong Cao, Auke Ijspeert, Guillaume Sartoretti

TL;DR

SATA tackles safety concerns in legged locomotion by delivering torque-based policies that directly drive actuators. It combines a biomechanical model with a growth-based training schedule to improve exploration, stability, and zero-shot sim-to-real transfer, enabling compliant interaction with humans and deformable terrains. The approach yields high compliance, robustness to disturbances, and reliable deployment without fine-tuning, outperforming baselines across challenging scenarios. This work demonstrates that physics-aware, growth-driven torque control can surpass traditional position-based methods in safety-critical, real-world contexts.

Abstract

Despite recent advances in learning-based controllers for legged robots, deployments in human-centric environments remain limited by safety concerns. Most of these approaches use position-based control, where policies output target joint angles that must be processed by a low-level controller (e.g., PD or impedance controllers) to compute joint torques. Although impressive results have been achieved in controlled real-world scenarios, these methods often struggle with compliance and adaptability when encountering environments or disturbances unseen during training, potentially resulting in extreme or unsafe behaviors. Inspired by how animals achieve smooth and adaptive movements by controlling muscle extension and contraction, torque-based policies offer a promising alternative by enabling precise and direct control of the actuators in torque space. In principle, this approach facilitates more effective interactions with the environment, resulting in safer and more adaptable behaviors. However, challenges such as a highly nonlinear state space and inefficient exploration during training have hindered their broader adoption. To address these limitations, we propose SATA, a bio-inspired framework that mimics key biomechanical principles and adaptive learning mechanisms observed in animal locomotion. Our approach effectively addresses the inherent challenges of learning torque-based policies by significantly improving early-stage exploration, leading to high-performance final policies. Remarkably, our method achieves zero-shot sim-to-real transfer. Our experimental results indicate that SATA demonstrates remarkable compliance and safety, even in challenging environments such as soft/slippery terrain or narrow passages, and under significant external disturbances, highlighting its potential for practical deployments in human-centric and safety-critical scenarios.

SATA: Safe and Adaptive Torque-Based Locomotion Policies Inspired by Animal Learning

TL;DR

SATA tackles safety concerns in legged locomotion by delivering torque-based policies that directly drive actuators. It combines a biomechanical model with a growth-based training schedule to improve exploration, stability, and zero-shot sim-to-real transfer, enabling compliant interaction with humans and deformable terrains. The approach yields high compliance, robustness to disturbances, and reliable deployment without fine-tuning, outperforming baselines across challenging scenarios. This work demonstrates that physics-aware, growth-driven torque control can surpass traditional position-based methods in safety-critical, real-world contexts.

Abstract

Despite recent advances in learning-based controllers for legged robots, deployments in human-centric environments remain limited by safety concerns. Most of these approaches use position-based control, where policies output target joint angles that must be processed by a low-level controller (e.g., PD or impedance controllers) to compute joint torques. Although impressive results have been achieved in controlled real-world scenarios, these methods often struggle with compliance and adaptability when encountering environments or disturbances unseen during training, potentially resulting in extreme or unsafe behaviors. Inspired by how animals achieve smooth and adaptive movements by controlling muscle extension and contraction, torque-based policies offer a promising alternative by enabling precise and direct control of the actuators in torque space. In principle, this approach facilitates more effective interactions with the environment, resulting in safer and more adaptable behaviors. However, challenges such as a highly nonlinear state space and inefficient exploration during training have hindered their broader adoption. To address these limitations, we propose SATA, a bio-inspired framework that mimics key biomechanical principles and adaptive learning mechanisms observed in animal locomotion. Our approach effectively addresses the inherent challenges of learning torque-based policies by significantly improving early-stage exploration, leading to high-performance final policies. Remarkably, our method achieves zero-shot sim-to-real transfer. Our experimental results indicate that SATA demonstrates remarkable compliance and safety, even in challenging environments such as soft/slippery terrain or narrow passages, and under significant external disturbances, highlighting its potential for practical deployments in human-centric and safety-critical scenarios.

Paper Structure

This paper contains 32 sections, 9 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Inspired by biomechanical principles and the growth mechanisms of animals in nature, we propose a framework that addresses the challenges of torque-based locomotion learning, achieving zero-shot sim-to-real transfer along with exceptional compliance and safety in challenging environments.
  • Figure 2: Overview of our SATA Framework. Dotted lines indicate parts used only during training, while solid lines indicate those used during both training and deployment. Our framework is mainly composed of 1) a Biomechanical Model (Orange) to ensure the generation of smooth, practical actuator commands $\tau$ while informing the policy of the current actuator state, and 2) a Growth Model (Green) to help the neural network train a more robust and generalizable policy by gradually adapting rewards $r_\textit{growth}$, control frequency $f_\textit{policy}$, and torque limits $\tau_\textit{limit}$ during training. Finally, we train a state estimator for real world deployment using simulated IMU data and temporal proprioception observations (Grey), to help condition our policy on the (estimated) current robot velocity.
  • Figure 3: Ablation study of the proposed framework, showing successful training in green and failure/premature convergence in red. SATA is compared with variants that lack the biomechanical model or the growth mechanism. Notice that without the growth model (SATA w/o Growth), the policy struggles to achieve high commanded velocities (1.5m/s), especially above the range seen during training. Without the biomechanical model (SATA w/o biomechanical model), the robot is completely unable to learn a coherent gait, instead learning to shift its feet on the floor asymmetrically.
  • Figure 4: Response to a sudden torque limitation on the front left leg (at $t = 0.5\,\mathrm{s}$). During this disturbance ($0.5\,\mathrm{s} < t < 1.5\,\mathrm{s}$), the robot dynamically compensates using other legs, and rapidly recovers once the torque is restored ($1.5\,\mathrm{s} < t < 1.75\,\mathrm{s}$).
  • Figure 5: Comparison of SATA and SATA w/o Growth. Training rewards (a), without G(t) adaptation, and cumulative rewards in simulation test (b), when commanded to run at 1.8 m/s (slightly OOD).
  • ...and 7 more figures