Generalized Policy Learning for Smart Grids: FL TRPO Approach

Yunxiang Li; Nicolas Mauricio Cuadrado; Samuel Horváth; Martin Takáč

Generalized Policy Learning for Smart Grids: FL TRPO Approach

Yunxiang Li, Nicolas Mauricio Cuadrado, Samuel Horváth, Martin Takáč

TL;DR

This paper introduces a framework that combines FL with a Trust Region Policy Optimization (FL TRPO) aiming to reduce energy-associated emissions and costs, and demonstrates the robustness of the approach, affirming its proficiency in effectively learning policy models for smart grid challenges.

Abstract

The smart grid domain requires bolstering the capabilities of existing energy management systems; Federated Learning (FL) aligns with this goal as it demonstrates a remarkable ability to train models on heterogeneous datasets while maintaining data privacy, making it suitable for smart grid applications, which often involve disparate data distributions and interdependencies among features that hinder the suitability of linear models. This paper introduces a framework that combines FL with a Trust Region Policy Optimization (FL TRPO) aiming to reduce energy-associated emissions and costs. Our approach reveals latent interconnections and employs personalized encoding methods to capture unique insights, understanding the relationships between features and optimal strategies, allowing our model to generalize to previously unseen data. Experimental results validate the robustness of our approach, affirming its proficiency in effectively learning policy models for smart grid challenges.

Generalized Policy Learning for Smart Grids: FL TRPO Approach

TL;DR

Abstract

Paper Structure (10 sections, 5 equations, 7 figures, 1 table)

This paper contains 10 sections, 5 equations, 7 figures, 1 table.

Introduction
Related Work
Problem and Environment
Method
Experiments
Conclusion
Preliminaries
FedAvg
TRPO
Experiment settings

Figures (7)

Figure 1: Our model captures the inherent interdependencies among features in mapping between states and policies.
Figure 2: Average reward and emission of the five buildings across five random seeds. Upperboud (Blue): A single TRPO agent trained using the testing dataset to establish the upper-performance limit. FL (Green): Model structured with all parts shared trained with FL methodology. Ind. Agent (Red): TRPO agent trained separately for each building. FL Personalization (Orange): FL TRPO with personalized encoding as detailed in Section \ref{['sec:model']}, trained using FL methodology.
Figure 3: Scenario definition.
Figure 4: Training and testing solar generation data of each building.
Figure 5: Training and testing non-shiftable load data of each building.
...and 2 more figures

Generalized Policy Learning for Smart Grids: FL TRPO Approach

TL;DR

Abstract

Generalized Policy Learning for Smart Grids: FL TRPO Approach

Authors

TL;DR

Abstract

Table of Contents

Figures (7)