Table of Contents
Fetching ...

Addressing Heterogeneity in Federated Load Forecasting with Personalization Layers

Shourya Bose, Yu Zhang, Kibaek Kim

TL;DR

The paper tackles privacy-preserving short-term load forecasting in federated settings, where data heterogeneity across clients degrades standard FL performance. It introduces personalization layers (PL-FL), which partition model parameters into shared and personalized subsets so that only shared parameters are communicated, reducing bandwidth and better accommodating non-i.i.d data. Through an LSTM-based forecasting model evaluated on the NREL ComStock dataset across three regions, PL-FL with a personalized MLP and Adam optimizer achieves the best accuracy while lowering communication compared to conventional FL, albeit with some datasets remaining challenging. The results suggest PL-FL as a practical, bandwidth-efficient approach for privacy-conscious smart grid forecasting, with future work focusing on distributed inference for real-world deployment on edge devices.

Abstract

The advent of smart meters has enabled pervasive collection of energy consumption data for training short-term load forecasting models. In response to privacy concerns, federated learning (FL) has been proposed as a privacy-preserving approach for training, but the quality of trained models degrades as client data becomes heterogeneous. In this paper we propose the use of personalization layers for load forecasting in a general framework called PL-FL. We show that PL-FL outperforms FL and purely local training, while requiring lower communication bandwidth than FL. This is done through extensive simulations on three different datasets from the NREL ComStock repository.

Addressing Heterogeneity in Federated Load Forecasting with Personalization Layers

TL;DR

The paper tackles privacy-preserving short-term load forecasting in federated settings, where data heterogeneity across clients degrades standard FL performance. It introduces personalization layers (PL-FL), which partition model parameters into shared and personalized subsets so that only shared parameters are communicated, reducing bandwidth and better accommodating non-i.i.d data. Through an LSTM-based forecasting model evaluated on the NREL ComStock dataset across three regions, PL-FL with a personalized MLP and Adam optimizer achieves the best accuracy while lowering communication compared to conventional FL, albeit with some datasets remaining challenging. The results suggest PL-FL as a practical, bandwidth-efficient approach for privacy-conscious smart grid forecasting, with future work focusing on distributed inference for real-world deployment on edge devices.

Abstract

The advent of smart meters has enabled pervasive collection of energy consumption data for training short-term load forecasting models. In response to privacy concerns, federated learning (FL) has been proposed as a privacy-preserving approach for training, but the quality of trained models degrades as client data becomes heterogeneous. In this paper we propose the use of personalization layers for load forecasting in a general framework called PL-FL. We show that PL-FL outperforms FL and purely local training, while requiring lower communication bandwidth than FL. This is done through extensive simulations on three different datasets from the NREL ComStock repository.
Paper Structure (9 sections, 3 equations, 5 figures, 3 algorithms)

This paper contains 9 sections, 3 equations, 5 figures, 3 algorithms.

Figures (5)

  • Figure 1: Mean and standard deviation of loads of 12 clients each from the tree datasets.
  • Figure 2: Total data transferred per global epoch per client. This consists of a bidirectional communication between the server and client.
  • Figure 3: Average MASE metric on the test set for different client and server algorithms via PL-FL. These are calculated for local training, MLP personalization, and all layers shared (FL). Three datasets viz. New York, Illinois, and California are used.
  • Figure 4: A schematic of ClientOpt for different client algorithms. Note that in line \ref{['line:localInit']}, client states $\mathbf{m}$ or $\mathbf{v}$ are reinitialized rather than inheriting stale values to ensure better performance. All vector operations here are elementwise.
  • Figure 5: A schematic of ServerOpt for different server algorithms. All algorithms use a single state except FedAvgAdaptive, which uses $N$ distinct states for each of the clients. All vector operations here are elementwise.