Reinforcement Learning-based Home Energy Management with Heterogeneous Batteries and Stochastic EV Behaviour
Meng Yuan, Ye Wang, Xinghuo Yu, Torsten Wik, Changfu Zou
TL;DR
The paper tackles home energy management with PV, stationary storage, and EVs under stochastic EV usage by formulating a constrained Markov decision process and solving it with a Lagrangian Soft Actor-Critic algorithm. It uniquely incorporates heterogeneous degradation dynamics for stationary and EV batteries and models stochastic EV arrival/departure and driving distance using Swedish travel data, enabling robust policy learning. The approach achieves price arbitrage, maintains indoor comfort within tight bounds, and reduces total cost while lowering battery degradation compared to rule-based baselines. This method offers a practical, data-driven framework for real-world HEMS, improving economic performance and battery longevity without sacrificing occupant comfort. The results highlight the importance of respecting technology-heterogeneity and user behavior in DRL-based energy management for homes.
Abstract
The widespread adoption of photovoltaic (PV), electric vehicles (EVs), and stationary energy storage systems (ESS) in households increases system complexity while simultaneously offering new opportunities for energy regulation. However, effectively coordinating these resources under uncertainties remains challenging. This paper proposes a novel home energy management framework based on deep reinforcement learning (DRL) that can jointly minimise energy expenditure and battery degradation while guaranteeing occupant comfort and EV charging requirements. Distinct from existing studies, we explicitly account for the heterogeneous degradation characteristics of stationary and EV batteries in the optimisation, alongside stochastic user behaviour regarding arrival time, departure time, and driving distance. The energy scheduling problem is formulated as a constrained Markov decision process (CMDP) and solved using a Lagrangian soft actor-critic (SAC) algorithm. This approach enables the agent to learn optimal control policies that enforce physical constraints, including indoor temperature bounds and target EV state of charge upon departure, despite stochastic uncertainties. Numerical simulations over a one-year horizon demonstrate the effectiveness of the proposed framework in satisfying physical constraints while eliminating thermal oscillations and achieving significant economic benefits. Specifically, the method reduces the cumulative operating cost substantially compared to two standard rule-based baselines while simultaneously decreasing battery degradation costs by 8.44%.
