Table of Contents
Fetching ...

Disturbance Observer-based Control Barrier Functions with Residual Model Learning for Safe Reinforcement Learning

Dvij Kalaria, Qin Lin, John M. Dolan

TL;DR

The paper tackles safe reinforcement learning under model uncertainty and disturbances by introducing RES-DOB-CBF, an almost model-free framework that combines a nominal nonlinear model with residual learning and a disturbance observer to enforce safety through a robust control barrier function. The approach integrates a DOB and residual dynamics into a high-order CBF bound and solves a safety-focused QP to override potentially unsafe RL actions, enabling safe exploration. It is validated on the Safety-Gym benchmark (Point and Car tasks) under internal and external disturbances and demonstrated on a physical F1/10 RC car, where it outperforms state-of-the-art baselines in safety and efficiency. The results indicate that RES-DOB-CBF provides stronger guarantees and faster convergence for safe RL in real-world, uncertain environments, with practical impact for robust autonomous systems.

Abstract

Reinforcement learning (RL) agents need to explore their environment to learn optimal behaviors and achieve maximum rewards. However, exploration can be risky when training RL directly on real systems, while simulation-based training introduces the tricky issue of the sim-to-real gap. Recent approaches have leveraged safety filters, such as control barrier functions (CBFs), to penalize unsafe actions during RL training. However, the strong safety guarantees of CBFs rely on a precise dynamic model. In practice, uncertainties always exist, including internal disturbances from the errors of dynamics and external disturbances such as wind. In this work, we propose a new safe RL framework based on disturbance rejection-guarded learning, which allows for an almost model-free RL with an assumed but not necessarily precise nominal dynamic model. We demonstrate our results on the Safety-gym benchmark for Point and Car robots on all tasks where we can outperform state-of-the-art approaches that use only residual model learning or a disturbance observer (DOB). We further validate the efficacy of our framework using a physical F1/10 racing car. Videos: https://sites.google.com/view/res-dob-cbf-rl

Disturbance Observer-based Control Barrier Functions with Residual Model Learning for Safe Reinforcement Learning

TL;DR

The paper tackles safe reinforcement learning under model uncertainty and disturbances by introducing RES-DOB-CBF, an almost model-free framework that combines a nominal nonlinear model with residual learning and a disturbance observer to enforce safety through a robust control barrier function. The approach integrates a DOB and residual dynamics into a high-order CBF bound and solves a safety-focused QP to override potentially unsafe RL actions, enabling safe exploration. It is validated on the Safety-Gym benchmark (Point and Car tasks) under internal and external disturbances and demonstrated on a physical F1/10 RC car, where it outperforms state-of-the-art baselines in safety and efficiency. The results indicate that RES-DOB-CBF provides stronger guarantees and faster convergence for safe RL in real-world, uncertain environments, with practical impact for robust autonomous systems.

Abstract

Reinforcement learning (RL) agents need to explore their environment to learn optimal behaviors and achieve maximum rewards. However, exploration can be risky when training RL directly on real systems, while simulation-based training introduces the tricky issue of the sim-to-real gap. Recent approaches have leveraged safety filters, such as control barrier functions (CBFs), to penalize unsafe actions during RL training. However, the strong safety guarantees of CBFs rely on a precise dynamic model. In practice, uncertainties always exist, including internal disturbances from the errors of dynamics and external disturbances such as wind. In this work, we propose a new safe RL framework based on disturbance rejection-guarded learning, which allows for an almost model-free RL with an assumed but not necessarily precise nominal dynamic model. We demonstrate our results on the Safety-gym benchmark for Point and Car robots on all tasks where we can outperform state-of-the-art approaches that use only residual model learning or a disturbance observer (DOB). We further validate the efficacy of our framework using a physical F1/10 racing car. Videos: https://sites.google.com/view/res-dob-cbf-rl

Paper Structure

This paper contains 11 sections, 11 equations, 17 figures, 1 table, 1 algorithm.

Figures (17)

  • Figure 1: Propose safe RL using RES-DOB-CBF. At each time step $t$, the RL agent’s action $u_{t}^{\mathrm{RL}}$ is monitored and possibly overridden by the RES-DOB-CBF-based safety filter to avoid any unsafe actions.
  • Figure 2: Safety-gym environment tasks
  • Figure 3: Reference frame for obstacle avoidance. Blue circle: ego robot; orange circle: moving obstacle.
  • Figure 4: Comparisons for Goal1 task on Point robot
  • Figure 5: Comparisons for Goal2 task on Point robot
  • ...and 12 more figures