Adaptive BESS and Grid Setpoints Optimization: A Model-Free Framework for Efficient Battery Management under Dynamic Tariff Pricing
Alaa Selim, Huadong Mo, Hemanshu Pota, Daoyi Dong
TL;DR
The paper tackles non-convex BESS and grid setpoint optimization under dynamic tariffs by first establishing a gradient-based Adam benchmark and then deploying a model-free DRL framework using off-policy Soft Actor-Critic (SAC). It introduces RMFEMF, a data-driven, model-free environment trained on high-resolution Australian field data, with a physics safety layer to enforce SOC and power-balance constraints and an entropy-aware uncertainty representation. Key contributions include reward refinement with logarithmic scaling, a safety mechanism for feasible actions, entropy-based uncertainty handling, and successful SOC maintenance above 50% while reducing optimization time by about 50% and cost by around 40% relative to the gradient benchmark. The approach demonstrates strong robustness and transferability across regions and uncertainty distributions, offering a practical pathway for real-time BESS management in smart buildings under dynamic tariff pricing.
Abstract
This paper introduces an enhanced framework for managing Battery Energy Storage Systems (BESS) in residential communities. The non-convex BESS control problem is first addressed using a gradient-based optimizer, providing a benchmark solution. Subsequently, the problem is tackled using multiple Deep Reinforcement Learning (DRL) agents, with a specific emphasis on the off-policy Soft Actor-Critic (SAC) algorithm. This version of SAC incorporates reward refinement based on this non-convex problem, applying logarithmic scaling to enhance convergence rates. Additionally, a safety mechanism selects only feasible actions from the action space, aimed at improving the learning curve, accelerating convergence, and reducing computation times. Moreover, the state representation of this DRL approach now includes uncertainties quantified in the entropy term, enhancing the model's adaptability across various entropy types. This developed system adheres to strict limits on the battery's State of Charge (SOC), thus preventing breaches of SOC boundaries and extending the battery lifespan. The robustness of the model is validated across several Australian states' districts, each characterized by unique uncertainty distributions. By implementing the refined SAC, the SOC consistently surpasses 50 percent by the end of each day, enabling the BESS control to start smoothly for the next day with some reserve. Finally, this proposed DRL method achieves a mean reduction in optimization time by 50 percent and an average cost saving of 40 percent compared to the gradient-based optimization benchmark.
