Table of Contents
Fetching ...

QuietPaw: Learning Quadrupedal Locomotion with Versatile Noise Preference Alignment

Yuyou Zhang, Yihang Yao, Shiqi Liu, Yaru Niu, Changyi Lin, Yuxiang Yang, Wenhao Yu, Tingnan Zhang, Jie Tan, Ding Zhao

TL;DR

The paper tackles loud footstep noise in quadrupedal locomotion by introducing CNCP, a conditional constrained RL framework that tunes policy behavior through a noise-threshold $\epsilon$ without retraining. It leverages a successor-feature decomposition in the critics to separate state dynamics from constraint effects, enabling generalization across noise levels and improved Pareto efficiency between agility and noise reduction. Through simulation in Isaac Gym and real-world tests on a Unitree Go2, CNCP achieves continuously adjustable noise reduction while preserving locomotion performance, outperforming baseline conditioned policies in cost violation and tracking. This work advances adaptable, socially aware quadrupedal robotics with practical deployment benefits in noise-sensitive environments.

Abstract

When operating at their full capacity, quadrupedal robots can produce loud footstep noise, which can be disruptive in human-centered environments like homes, offices, and hospitals. As a result, balancing locomotion performance with noise constraints is crucial for the successful real-world deployment of quadrupedal robots. However, achieving adaptive noise control is challenging due to (a) the trade-off between agility and noise minimization, (b) the need for generalization across diverse deployment conditions, and (c) the difficulty of effectively adjusting policies based on noise requirements. We propose QuietPaw, a framework incorporating our Conditional Noise-Constrained Policy (CNCP), a constrained learning-based algorithm that enables flexible, noise-aware locomotion by conditioning policy behavior on noise-reduction levels. We leverage value representation decomposition in the critics, disentangling state representations from condition-dependent representations and this allows a single versatile policy to generalize across noise levels without retraining while improving the Pareto trade-off between agility and noise reduction. We validate our approach in simulation and the real world, demonstrating that CNCP can effectively balance locomotion performance and noise constraints, achieving continuously adjustable noise reduction.

QuietPaw: Learning Quadrupedal Locomotion with Versatile Noise Preference Alignment

TL;DR

The paper tackles loud footstep noise in quadrupedal locomotion by introducing CNCP, a conditional constrained RL framework that tunes policy behavior through a noise-threshold without retraining. It leverages a successor-feature decomposition in the critics to separate state dynamics from constraint effects, enabling generalization across noise levels and improved Pareto efficiency between agility and noise reduction. Through simulation in Isaac Gym and real-world tests on a Unitree Go2, CNCP achieves continuously adjustable noise reduction while preserving locomotion performance, outperforming baseline conditioned policies in cost violation and tracking. This work advances adaptable, socially aware quadrupedal robotics with practical deployment benefits in noise-sensitive environments.

Abstract

When operating at their full capacity, quadrupedal robots can produce loud footstep noise, which can be disruptive in human-centered environments like homes, offices, and hospitals. As a result, balancing locomotion performance with noise constraints is crucial for the successful real-world deployment of quadrupedal robots. However, achieving adaptive noise control is challenging due to (a) the trade-off between agility and noise minimization, (b) the need for generalization across diverse deployment conditions, and (c) the difficulty of effectively adjusting policies based on noise requirements. We propose QuietPaw, a framework incorporating our Conditional Noise-Constrained Policy (CNCP), a constrained learning-based algorithm that enables flexible, noise-aware locomotion by conditioning policy behavior on noise-reduction levels. We leverage value representation decomposition in the critics, disentangling state representations from condition-dependent representations and this allows a single versatile policy to generalize across noise levels without retraining while improving the Pareto trade-off between agility and noise reduction. We validate our approach in simulation and the real world, demonstrating that CNCP can effectively balance locomotion performance and noise constraints, achieving continuously adjustable noise reduction.

Paper Structure

This paper contains 16 sections, 10 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of our Conditional Noise-Constrained Policy (CNCP). Left: CNCP conditions its policy on a noise reduction level $\epsilon$, dynamically adjusting locomotion behavior for different noise sensitivity requirements. Right: The trade-off between task performance and noise reduction forms a Pareto front, where improving one objective compromises the other. Solutions on the Pareto front achieve optimal trade-offs, while dominated solutions are suboptimal in both noise minimization and performance. Bottom: The same policy produces different noise levels depending on terrain type, highlighting the necessity for adaptable noise control.
  • Figure 2: (a) Trend lines showing the relationship between Normalized Cost and both Square Impact Velocity $(m^2/s^2)$ and average top $10\%$ Impact Force $(N)$. illustrating a positive correlation. (b) Illustration of the relationship between impact velocity and impact force across different levels of Normalized Cost. As Normalized Cost increases, both squared impact velocity and impact force exhibit corresponding growth.
  • Figure 3: Pareto fronts across target velocities on even (a) and rough (b) terrains. The Pareto front represents solutions that achieve an optimal trade-off between normalized cost and tracking error—where improving one metric degrades the other. Solutions on the front (lower left) are Pareto-optimal, while dominated ones lie above or to the right, indicating suboptimal trade-offs. The shaded area represents the hypervolume van2014multi of each method's Pareto front, with only Pareto solutions contributing to it.
  • Figure 4: Real-world evaluation of CNCP under different terrains and target velocities. The 1m/s deployment runs for 300 steps (6s at 50Hz), while the 1.5m/s deployment runs for 200 steps (4s).
  • Figure 5: Real-world comparison of CNCP with baseline conditional policies and an unconstrained policy. Each deployment runs for 200 steps (4s) on rubber terrain at 1.5 m/s. The unconstrained policy remains constant across conditions as it does not take noise reduction level as input.
  • ...and 1 more figures