Adaptive Safe Reinforcement Learning-Enabled Optimization of Battery Fast-Charging Protocols

Myisha A. Chowdhury; Saif S. S. Al-Wahaibi; Qiugang Lu

Adaptive Safe Reinforcement Learning-Enabled Optimization of Battery Fast-Charging Protocols

Myisha A. Chowdhury, Saif S. S. Al-Wahaibi, Qiugang Lu

TL;DR

The paper addresses safe, rapid battery fast-charging via an adaptive safe reinforcement learning framework. It combines TD3 as the learning engine with an action-projection safety layer that uses Gaussian process surrogates to predict constraint behavior and enforce them with an upper confidence bound, ensuring safety with high probability. A static GP-based safety model provides a baseline, while an adaptive GP scheme learns residual dynamics in real time to cope with ambient and aging-induced changes. Simulation results with PyBaMM show that adaptive safe TD3 maintains temperature and voltage constraints under varying conditions, albeit with some conservative charging, highlighting the method's practical potential for reliable, fast-charging protocols.

Abstract

Optimizing charging protocols is critical for reducing battery charging time and decelerating battery degradation in applications such as electric vehicles. Recently, reinforcement learning (RL) methods have been adopted for such purposes. However, RL-based methods may not ensure system (safety) constraints, which can cause irreversible damages to batteries and reduce their lifetime. To this end, this work proposes an adaptive and safe RL framework to optimize fast charging strategies while respecting safety constraints with a high probability. In our method, any unsafe action that the RL agent decides will be projected into a safety region by solving a constrained optimization problem. The safety region is constructed using adaptive Gaussian process (GP) models, consisting of static and dynamic GPs, that learn from online experience to adaptively account for any changes in battery dynamics. Simulation results show that our method can charge the batteries rapidly with constraint satisfaction under varying operating conditions.

Adaptive Safe Reinforcement Learning-Enabled Optimization of Battery Fast-Charging Protocols

TL;DR

Abstract

Paper Structure (16 sections, 19 equations, 8 figures, 1 table)

This paper contains 16 sections, 19 equations, 8 figures, 1 table.

Introduction
Preliminaries
Reinforcement learning
TD3 algorithm
Action projection-based safe RL
Gaussian process (GP) model
Action projection-based safe RL
Safe RL for fast charging optimization
Fast-charging optimization formulation
Static safe RL for fast charging optimization
Adaptive safe RL for varying environments
Simulation Results and Discussions
Safe RL-based fast charging with fixed conditions
Safe RL-based fast charging with varying conditions
Conclusion
...and 1 more sections

Figures (8)

Figure 1: Schematics of the proposed action projection-based safe RL.
Figure 2: The predicted next-step temperature (a) and voltage (b) by static (red) and adaptive (blue) GP models against the true temperature and voltage (dashed line) under different charging currents. The GP models are trained and tested under a fixed ambient temperature $25^\degree C$. Solid lines: posterior mean; Shaded areas: $\pm$3 standard deviations.
Figure 3: Training performance of traditional (green), safe (red), and adaptive safe (blue) TD3 in optimizing battery fast-charging protocols. (a) Cumulative rewards; (b) Charging time; (c) Maximum temperature; and (d) Maximum voltage, of each training episode. Magenta dashed: the allowed upper bounds of temperature and voltage.
Figure 4: The (a) charging current, (b) SOC, (c) temperature, and (d) voltage profiles, obtained by deploying the optimized protocols from the traditional (green), safe (red), adaptive safe (blue) TD3, and from classical CCCV (malibu). Magenta dashed lines in (c) and (d): allowed upper bounds of the temperature and voltage. Solid lines in (a): safe current profiles after projection.
Figure 5: The predicted next-step temperature (a) and voltage (b) by static (red) and adaptive (blue) GP models against the true temperature and voltage (dashed line) under different charging currents. The GP models are trained at $10^\degree C$ and tested at $36^\degree C$ of the ambient temperature. Solid lines: posterior mean; Shaded areas: $\pm$3 standard deviations.
...and 3 more figures

Adaptive Safe Reinforcement Learning-Enabled Optimization of Battery Fast-Charging Protocols

TL;DR

Abstract

Adaptive Safe Reinforcement Learning-Enabled Optimization of Battery Fast-Charging Protocols

Authors

TL;DR

Abstract

Table of Contents

Figures (8)