Optimal control barrier functions for RL based safe powertrain control
Habtamu Hailemichael, Beshah Ayalew, Andrej Ivanco
TL;DR
The paper tackles safety concerns in RL-based powertrain control by introducing high-order control barrier functions (HOCBF) to carve a safe, less-conservative region for exploration. It develops region-specific HOCBFs for a relative-degree-2 vehicle model and integrates them as a safety filter that projects RL-proposed torques via a quadratic program to a safe torque. The RL-HOCBF framework, evaluated on a medium-duty truck with an HMPO-based policy, shows no safety violations, improved driver accommodation, and a 7.6% gain in fuel economy compared to a model-based baseline, outperforming prior ECBF approaches. This approach provides a practical pathway to safe, high-performance RL in safety-critical powertrain applications and suggests extensions to additional vehicle DOFs in future work.
Abstract
Reinforcement learning (RL) can improve control performance by seeking to learn optimal control policies in the end-use environment for vehicles and other systems. To accomplish this, RL algorithms need to sufficiently explore the state and action spaces. This presents inherent safety risks, and applying RL on safety-critical systems like vehicle powertrain control requires safety enforcement approaches. In this paper, we seek control-barrier function (CBF)-based safety certificates that demarcate safe regions where the RL agent could optimize the control performance. In particular, we derive optimal high-order CBFs that avoid conservatism while ensuring safety for a vehicle in traffic. We demonstrate the workings of the high-order CBF with an RL agent which uses a deep actor-critic architecture to learn to optimize fuel economy and other driver accommodation metrics. We find that the optimized high-order CBF allows the RL-based powertrain control agent to achieve higher total rewards without any crashes in training and evaluation while achieving better accommodation of driver demands compared to previously proposed exponential barrier function filters and model-based baseline controllers.
