Physics-informed appliance signatures generator for energy disaggregation
Ilia Kamyshev, Sahar Moghimian Hoosh, Henni Ouerdane
TL;DR
Energy disaggregation faces overfitting when training data are limited and biased toward a subset of appliances. The authors propose two physics-informed appliance-signature generators for high-sampling-rate ($kHz$) and low-sampling-rate ($Hz$) signals that synthesize unlimited, physically plausible signatures without needing input data. They validate the approach with PCA and KL-divergence, showing the synthetic distributions are significantly closer to real data than prior work (roughly 1.6× improvement for high-rate and 3.9× for low-rate). The method supports scalable, diverse NILM datasets and is released as the Edframe open-source library, enabling broader adoption and improved generalization of disaggregation algorithms.
Abstract
Energy disaggregation is a promising solution to access detailed information on energy consumption in a household, by itemizing its total energy consumption. However, in real-world applications, overfitting remains a challenging problem for data-driven disaggregation methods. First, the available real-world datasets are biased towards the most frequently used appliances. Second, both real and synthetic publicly-available datasets are limited in number of appliances, which may not be sufficient for a disaggregation algorithm to learn complex relations among different types of appliances and their states. To address the lack of appliance data, we propose two physics-informed data generators: one for high sampling rate signals (kHz) and another for low sampling rate signals (Hz). These generators rely on prior knowledge of the physics of appliance energy consumption, and are capable of simulating a virtually unlimited number of different appliances and their corresponding signatures for any time period. Both methods involve defining a mathematical model, selecting centroids corresponding to individual appliances, sampling model parameters around each centroid, and finally substituting the obtained parameters into the mathematical model. Additionally, by using Principal Component Analysis and Kullback-Leibler divergence, we demonstrate that our methods significantly outperform the previous approaches.
