A Control-Barrier-Function-Based Algorithm for Policy Adaptation in Reinforcement Learning

Wenjian Hao; Zehui Lu; Nicolas Miguel; Shaoshuai Mou

A Control-Barrier-Function-Based Algorithm for Policy Adaptation in Reinforcement Learning

Wenjian Hao, Zehui Lu, Nicolas Miguel, Shaoshuai Mou

TL;DR

This work formulates policy adaptation in reinforcement learning as a constrained optimization that minimizes an added objective $J(\boldsymbol{\theta})$ while keeping deviations from a pretrained objective $G(\boldsymbol{\theta})$ within a relaxable bound $c$, subject to $G(\boldsymbol{\theta}_G^*) - G(\boldsymbol{\theta}) + c \ge 0$. It introduces a closed-loop CBP-based policy adaptation (CBF-PA) that computes a parameter update $\boldsymbol{a}(\boldsymbol{\theta})$ and the minimal relaxation $c^*$ via a quadratic program, guaranteeing constraint satisfaction by exploiting the set-invariance property of control barrier functions. The approach integrates with DDPG by augmenting the policy update with the CBP term, and includes theoretical results ensuring the safety of the constraint set and practical discrete-time safety considerations. Numerical demonstrations on Cartpole and Lunar Lander, along with real-world-like quadruped experiments, show that CBF-PA preserves near-optimal original-task performance while achieving substantially lower added-task costs, with statistical evidence supporting its superiority over baseline transfer-learning approaches. Overall, CBF-PA provides a principled, safe, and efficient mechanism to adapt pretrained policies to new objectives without retraining from scratch, enabling rapid, reliable task extensions in robotics and RL contexts.

Abstract

This paper considers the problem of adapting a predesigned policy, represented by a parameterized function class, from a solution that minimizes a given original cost function to a trade-off solution between minimizing the original objective and an additional cost function. The problem is formulated as a constrained optimization problem, where deviations from the optimal value of the original cost are explicitly constrained. To solve it, we develop a closed-loop system that governs the evolution of the policy parameters, with a closed-loop controller designed to adjust the additional cost gradient to ensure the satisfaction of the constraint. The resulting closed-loop system, termed control-barrier-function-based policy adaptation, exploits the set-invariance property of control barrier functions to guarantee constraint satisfaction. The effectiveness of the proposed method is demonstrated through numerical experiments on the Cartpole and Lunar Lander benchmarks from OpenAI Gym, as well as a quadruped robot, thereby illustrating both its practicality and potential for real-world policy adaptation.

A Control-Barrier-Function-Based Algorithm for Policy Adaptation in Reinforcement Learning

TL;DR

This work formulates policy adaptation in reinforcement learning as a constrained optimization that minimizes an added objective

while keeping deviations from a pretrained objective

within a relaxable bound

, subject to

. It introduces a closed-loop CBP-based policy adaptation (CBF-PA) that computes a parameter update

and the minimal relaxation

via a quadratic program, guaranteeing constraint satisfaction by exploiting the set-invariance property of control barrier functions. The approach integrates with DDPG by augmenting the policy update with the CBP term, and includes theoretical results ensuring the safety of the constraint set and practical discrete-time safety considerations. Numerical demonstrations on Cartpole and Lunar Lander, along with real-world-like quadruped experiments, show that CBF-PA preserves near-optimal original-task performance while achieving substantially lower added-task costs, with statistical evidence supporting its superiority over baseline transfer-learning approaches. Overall, CBF-PA provides a principled, safe, and efficient mechanism to adapt pretrained policies to new objectives without retraining from scratch, enabling rapid, reliable task extensions in robotics and RL contexts.

A Control-Barrier-Function-Based Algorithm for Policy Adaptation in Reinforcement Learning

TL;DR

Abstract

A Control-Barrier-Function-Based Algorithm for Policy Adaptation in Reinforcement Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (10)