A proximal policy optimization based intelligent home solar management
Kode Creer, Imitiaz Parvez
TL;DR
The paper tackles profit maximization for prosumers in a smart-home solar setup under dynamic electricity markets. It proposes a proximal policy optimization (PPO) framework augmented with recurrent rewards, generalized advantage estimation (GAE), soliton-based embeddings, and a sparse mixture-of-experts (MOE) forecaster to handle limited data and long-horizon planning. Empirical results show PPO delivers roughly a 30% boost in accumulated profits over naive baselines, with soliton embeddings improving generalization and reducing overfitting. The work offers a practical RL approach for long-term energy trading decisions and introduces data-augmentation and reward-structuring techniques applicable to related financial and sequential decision tasks, with open-source release.
Abstract
In the smart grid, the prosumers can sell unused electricity back to the power grid, assuming the prosumers own renewable energy sources and storage units. The maximizing of their profits under a dynamic electricity market is a problem that requires intelligent planning. To address this, we propose a framework based on Proximal Policy Optimization (PPO) using recurrent rewards. By using the information about the rewards modeled effectively with PPO to maximize our objective, we were able to get over 30\% improvement over the other naive algorithms in accumulating total profits. This shows promise in getting reinforcement learning algorithms to perform tasks required to plan their actions in complex domains like financial markets. We also introduce a novel method for embedding longs based on soliton waves that outperformed normal embedding in our use case with random floating point data augmentation.
