Table of Contents
Fetching ...

SAC-Loco: Safe and Adjustable Compliant Quadrupedal Locomotion

Aoqian Zhang, Zixuan Zhuang, Chunzheng Wang, Shuzhi Sam Ge, Fan Shi, Cheng Xiang

TL;DR

A safety aware compliant locomotion framework that integrates adjustable disturbance compliance with robust failure prevention and introduces a learned safety critic that monitors the robot's safety in real time and coordinates between compliant locomotion and recovery behaviors.

Abstract

Quadruped robots are designed to achieve agile and robust locomotion by drawing inspiration from legged animals. However, most existing control methods for quadruped robots lack a key capacity observed in animals: the ability to exhibit diverse compliance behaviors while ensuring stability when experiencing external forces. In particular, achieving adjustable compliance while maintaining robust safety under force disturbances remains a significant challenge. In this work, we propose a safety aware compliant locomotion framework that integrates adjustable disturbance compliance with robust failure prevention. We first train a force compliant policy with adjustable compliance levels using a teacher student reinforcement learning framework, allowing deployment without explicit force sensing. To handle disturbances beyond the limits of compliant control, we develop a safety oriented policy for rapid recovery and stabilization. Finally, we introduce a learned safety critic that monitors the robot's safety in real time and coordinates between compliant locomotion and recovery behaviors. Together, this framework enables quadruped robots to achieve smooth force compliance and robust safety under a wide range of external force disturbances.

SAC-Loco: Safe and Adjustable Compliant Quadrupedal Locomotion

TL;DR

A safety aware compliant locomotion framework that integrates adjustable disturbance compliance with robust failure prevention and introduces a learned safety critic that monitors the robot's safety in real time and coordinates between compliant locomotion and recovery behaviors.

Abstract

Quadruped robots are designed to achieve agile and robust locomotion by drawing inspiration from legged animals. However, most existing control methods for quadruped robots lack a key capacity observed in animals: the ability to exhibit diverse compliance behaviors while ensuring stability when experiencing external forces. In particular, achieving adjustable compliance while maintaining robust safety under force disturbances remains a significant challenge. In this work, we propose a safety aware compliant locomotion framework that integrates adjustable disturbance compliance with robust failure prevention. We first train a force compliant policy with adjustable compliance levels using a teacher student reinforcement learning framework, allowing deployment without explicit force sensing. To handle disturbances beyond the limits of compliant control, we develop a safety oriented policy for rapid recovery and stabilization. Finally, we introduce a learned safety critic that monitors the robot's safety in real time and coordinates between compliant locomotion and recovery behaviors. Together, this framework enables quadruped robots to achieve smooth force compliance and robust safety under a wide range of external force disturbances.

Paper Structure

This paper contains 16 sections, 13 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: (a) Quadruped robot compliantly follow the leader robot. (b) Quadruped robot pulling a human on a chair. (c) A: Quadruped robot move along with a pulling force compliantly and B: jump toward the pulling force to restore balance.
  • Figure 2: Overview of SAC-Loco. The teacher compliant policy $\pi_{\text{comply}}^*$ with velocity modulation is first trained using privileged observations. It is then distilled into a student compliance policy $\pi_{\text{comply}}$. The failure rollouts from $\pi_{\text{comply}}$ are collected into an unsafe dataset $\mathcal{D}_{\text{unsafe}}$. A teacher safe policy $\pi_{\text{safe}}^*$ is trained to recover from unsafe states initialized from $\mathcal{D}_{\text{unsafe}}$ and other large force disturbances using capture point dynamics. After distilling into the student safe policy $\pi_{\text{safe}}$, a safety critic $V_{\text{safe}}$ is trained to estimate the recoverability of $\pi_{\text{safe}}$. During deployment, $V_{\text{safe}}$ selects the policy to control the robot.
  • Figure 3: (a): Average effective compliance over the compliance parameters for different range of disturbance magnitude. (b): Effective compliance range of different methods.
  • Figure 4: Top: Polar heatmap showing the success rate under different force magnitude and direction of force relative to robot's heading. Bottom: Heatmap showing success rate under different force magnitude and impact duration.
  • Figure 5: The effect of the safety critic output threshold $\epsilon$ on SAC-Loco's performance under different force disturbance ranges.
  • ...and 3 more figures