Rethinking Robustness Assessment: Adversarial Attacks on Learning-based Quadrupedal Locomotion Controllers
Fan Shi, Chong Zhang, Takahiro Miki, Joonho Lee, Marco Hutter, Stelian Coros
TL;DR
This work addresses the vulnerability of learning-based quadrupedal locomotion controllers by introducing sequential adversarial attacks that reveal failure modes not captured by standard robustness measures. It develops a Lipschitz-regularized RL framework to learn time-series adversaries across observation, command, and perturbation spaces, and validates findings on both a simulated platform and real hardware, including the DARPA SubT-winning policy. The authors show that domain randomization is insufficient alone for robustness, and that multi-modal, terrain-aware adversaries combined with adversarial finetuning significantly improve safety and reliability, effectively serving as a robustness diagnostic and enhancement tool. The approach extends to other controllers (e.g., MPC) and offers practical insight for safety verification and deployment of neural locomotion policies in complex environments.
Abstract
Legged locomotion has recently achieved remarkable success with the progress of machine learning techniques, especially deep reinforcement learning (RL). Controllers employing neural networks have demonstrated empirical and qualitative robustness against real-world uncertainties, including sensor noise and external perturbations. However, formally investigating the vulnerabilities of these locomotion controllers remains a challenge. This difficulty arises from the requirement to pinpoint vulnerabilities across a long-tailed distribution within a high-dimensional, temporally sequential space. As a first step towards quantitative verification, we propose a computational method that leverages sequential adversarial attacks to identify weaknesses in learned locomotion controllers. Our research demonstrates that, even state-of-the-art robust controllers can fail significantly under well-designed, low-magnitude adversarial sequence. Through experiments in simulation and on the real robot, we validate our approach's effectiveness, and we illustrate how the results it generates can be used to robustify the original policy and offer valuable insights into the safety of these black-box policies. Project page: https://fanshi14.github.io/me/rss24.html
