Table of Contents
Fetching ...

Improving generalization of robot locomotion policies via Sharpness-Aware Reinforcement Learning

Severin Bochem, Eduardo Gonzalez-Sanchez, Yves Bicker, Gabriele Fadini

TL;DR

The paper addresses the challenge of robust sim-to-real transfer for robot locomotion by combining SHAC with Adaptive Sharpness-Aware Minimization (ASAM) to form SHAC-ASAM. This approach promotes flatter minima in the loss landscape, aiming to retain the sample efficiency of gradient-based methods while enhancing robustness to action perturbations and environmental variations. Experimental results in contact-rich Ant and Humanoid environments show SHAC-ASAM improves generalization and robustness relative to vanilla SHAC, approaching zeroth-order like robustness demonstrated by PPO, albeit with higher computational cost. The work suggests a practical path to more reliable sim-to-real transfer and indicates potential applicability of sharpness-aware optimization to other first-order reinforcement learning algorithms in robotics.

Abstract

Reinforcement learning often requires extensive training data. Simulation-to-real transfer offers a promising approach to address this challenge in robotics. While differentiable simulators offer improved sample efficiency through exact gradients, they can be unstable in contact-rich environments and may lead to poor generalization. This paper introduces a novel approach integrating sharpness-aware optimization into gradient-based reinforcement learning algorithms. Our simulation results demonstrate that our method, tested on contact-rich environments, significantly enhances policy robustness to environmental variations and action perturbations while maintaining the sample efficiency of first-order methods. Specifically, our approach improves action noise tolerance compared to standard first-order methods and achieves generalization comparable to zeroth-order methods. This improvement stems from finding flatter minima in the loss landscape, associated with better generalization. Our work offers a promising solution to balance efficient learning and robust sim-to-real transfer in robotics, potentially bridging the gap between simulation and real-world performance.

Improving generalization of robot locomotion policies via Sharpness-Aware Reinforcement Learning

TL;DR

The paper addresses the challenge of robust sim-to-real transfer for robot locomotion by combining SHAC with Adaptive Sharpness-Aware Minimization (ASAM) to form SHAC-ASAM. This approach promotes flatter minima in the loss landscape, aiming to retain the sample efficiency of gradient-based methods while enhancing robustness to action perturbations and environmental variations. Experimental results in contact-rich Ant and Humanoid environments show SHAC-ASAM improves generalization and robustness relative to vanilla SHAC, approaching zeroth-order like robustness demonstrated by PPO, albeit with higher computational cost. The work suggests a practical path to more reliable sim-to-real transfer and indicates potential applicability of sharpness-aware optimization to other first-order reinforcement learning algorithms in robotics.

Abstract

Reinforcement learning often requires extensive training data. Simulation-to-real transfer offers a promising approach to address this challenge in robotics. While differentiable simulators offer improved sample efficiency through exact gradients, they can be unstable in contact-rich environments and may lead to poor generalization. This paper introduces a novel approach integrating sharpness-aware optimization into gradient-based reinforcement learning algorithms. Our simulation results demonstrate that our method, tested on contact-rich environments, significantly enhances policy robustness to environmental variations and action perturbations while maintaining the sample efficiency of first-order methods. Specifically, our approach improves action noise tolerance compared to standard first-order methods and achieves generalization comparable to zeroth-order methods. This improvement stems from finding flatter minima in the loss landscape, associated with better generalization. Our work offers a promising solution to balance efficient learning and robust sim-to-real transfer in robotics, potentially bridging the gap between simulation and real-world performance.

Paper Structure

This paper contains 19 sections, 9 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: Average episode reward heatmaps for SHAC (left) and PPO (right) policies under varying contact stiffness ($k_e$) and damping ($k_d$) in the Ant environment.
  • Figure 2: Average episode reward as function of the noise strenght for SHAC, SHAC-ASAM, and PPO. The rewards are averaged over 100 rollouts, from 3 different policies per algorithm. The shades represent the standard deviation of the reward.
  • Figure 3: Average episode reward as a function of the contact Coulomb friction for SHAC, SHAC-ASAM, and PPO. The rewards are averaged over 100 rollouts, from 3 different policies per algorithm. The shades represent the standard deviation of the averages of the reward.
  • Figure 4: Reward vs Action Noise for policies trained with SHAC-SAM for different $\rho$ values, illustrating the trade-off between performance and generalization