Table of Contents
Fetching ...

A Control-Barrier-Function-Based Algorithm for Policy Adaptation in Reinforcement Learning

Wenjian Hao, Zehui Lu, Nicolas Miguel, Shaoshuai Mou

TL;DR

This work formulates policy adaptation in reinforcement learning as a constrained optimization that minimizes an added objective $J(\boldsymbol{\theta})$ while keeping deviations from a pretrained objective $G(\boldsymbol{\theta})$ within a relaxable bound $c$, subject to $G(\boldsymbol{\theta}_G^*) - G(\boldsymbol{\theta}) + c \ge 0$. It introduces a closed-loop CBP-based policy adaptation (CBF-PA) that computes a parameter update $\boldsymbol{a}(\boldsymbol{\theta})$ and the minimal relaxation $c^*$ via a quadratic program, guaranteeing constraint satisfaction by exploiting the set-invariance property of control barrier functions. The approach integrates with DDPG by augmenting the policy update with the CBP term, and includes theoretical results ensuring the safety of the constraint set and practical discrete-time safety considerations. Numerical demonstrations on Cartpole and Lunar Lander, along with real-world-like quadruped experiments, show that CBF-PA preserves near-optimal original-task performance while achieving substantially lower added-task costs, with statistical evidence supporting its superiority over baseline transfer-learning approaches. Overall, CBF-PA provides a principled, safe, and efficient mechanism to adapt pretrained policies to new objectives without retraining from scratch, enabling rapid, reliable task extensions in robotics and RL contexts.

Abstract

This paper considers the problem of adapting a predesigned policy, represented by a parameterized function class, from a solution that minimizes a given original cost function to a trade-off solution between minimizing the original objective and an additional cost function. The problem is formulated as a constrained optimization problem, where deviations from the optimal value of the original cost are explicitly constrained. To solve it, we develop a closed-loop system that governs the evolution of the policy parameters, with a closed-loop controller designed to adjust the additional cost gradient to ensure the satisfaction of the constraint. The resulting closed-loop system, termed control-barrier-function-based policy adaptation, exploits the set-invariance property of control barrier functions to guarantee constraint satisfaction. The effectiveness of the proposed method is demonstrated through numerical experiments on the Cartpole and Lunar Lander benchmarks from OpenAI Gym, as well as a quadruped robot, thereby illustrating both its practicality and potential for real-world policy adaptation.

A Control-Barrier-Function-Based Algorithm for Policy Adaptation in Reinforcement Learning

TL;DR

This work formulates policy adaptation in reinforcement learning as a constrained optimization that minimizes an added objective while keeping deviations from a pretrained objective within a relaxable bound , subject to . It introduces a closed-loop CBP-based policy adaptation (CBF-PA) that computes a parameter update and the minimal relaxation via a quadratic program, guaranteeing constraint satisfaction by exploiting the set-invariance property of control barrier functions. The approach integrates with DDPG by augmenting the policy update with the CBP term, and includes theoretical results ensuring the safety of the constraint set and practical discrete-time safety considerations. Numerical demonstrations on Cartpole and Lunar Lander, along with real-world-like quadruped experiments, show that CBF-PA preserves near-optimal original-task performance while achieving substantially lower added-task costs, with statistical evidence supporting its superiority over baseline transfer-learning approaches. Overall, CBF-PA provides a principled, safe, and efficient mechanism to adapt pretrained policies to new objectives without retraining from scratch, enabling rapid, reliable task extensions in robotics and RL contexts.

Abstract

This paper considers the problem of adapting a predesigned policy, represented by a parameterized function class, from a solution that minimizes a given original cost function to a trade-off solution between minimizing the original objective and an additional cost function. The problem is formulated as a constrained optimization problem, where deviations from the optimal value of the original cost are explicitly constrained. To solve it, we develop a closed-loop system that governs the evolution of the policy parameters, with a closed-loop controller designed to adjust the additional cost gradient to ensure the satisfaction of the constraint. The resulting closed-loop system, termed control-barrier-function-based policy adaptation, exploits the set-invariance property of control barrier functions to guarantee constraint satisfaction. The effectiveness of the proposed method is demonstrated through numerical experiments on the Cartpole and Lunar Lander benchmarks from OpenAI Gym, as well as a quadruped robot, thereby illustrating both its practicality and potential for real-world policy adaptation.

Paper Structure

This paper contains 29 sections, 4 theorems, 49 equations, 15 figures.

Key Result

Lemma 1

For any $\boldsymbol{\theta}\in\mathbb{R}^p$, if $\kappa(h(\boldsymbol{\theta}) + c) = \gamma_{\mathrm{h}}(h(\boldsymbol{\theta})+c)$ with $\gamma_{\mathrm{h}}>0$ a given constant, then the closed-loop controller $\boldsymbol{a}(\boldsymbol{\theta})$ and minimal relaxation constant $c^*$ that solve where the auxiliary terms are given by and $L_f$ and $L_g$ are defined in eq_Lh.

Figures (15)

  • Figure 1: A robot executes pretrained policies to perform original tasks. Adapted policies enable the robot to fulfill both original tasks and additional tasks.
  • Figure 2: Proposed CBF-PA under various values of $w$, shown over the contour plot of the function $G(x, y)$.
  • Figure 3: MOGD under various values of $w$, shown over the contour plot of the function $G(x, y)$.
  • Figure 4: Proposed CBF-PA for different $\gamma_{\mathrm{h}}$, plotted on the contours of $G(x, y)$.
  • Figure 5: Proposed CBF-PA for different $\alpha$, plotted on the contours $G(x, y)$.
  • ...and 10 more figures

Theorems & Definitions (10)

  • Remark 1
  • Definition 1: CBFs for Continuous-time Dynamical Systems ames2019control
  • Definition 2: Constraint Satisfaction with CBFs ames2019control
  • Remark 2
  • Remark 3
  • Lemma 1
  • Remark 4
  • Lemma 2
  • Theorem 1
  • Lemma 3: Inter-Sample Safety Guarantees, Theorem $3$ gurriet_applied_safety