Learning a Reward Function for User-Preferred Appliance Scheduling

Nikolina Čović; Jochen L. Cremer; Hrvoje Pandžić

Learning a Reward Function for User-Preferred Appliance Scheduling

Nikolina Čović, Jochen L. Cremer, Hrvoje Pandžić

TL;DR

This work addresses enabling privacy-preserving, user-friendly residential demand response by learning the implicit reward guiding appliance scheduling through inverse reinforcement learning. By modeling demand response as a Markov decision process and recovering a reward function $R(s) = \sum_i \alpha_i \phi_i(s)$ from observed behavior, the approach infers how users trade comfort against economic or environmental benefits, using historical consumption data and simulated DR data where necessary. The authors deploy a Deep Q-Network to generate day-level policies and evaluate generalization across days and multiple households, demonstrating that the learned reward can closely reproduce user-driven demand response in many cases, though accuracy declines as the number of controllable devices grows. This framework offers a privacy-conscious pathway to increase user participation in demand response by aligning scheduling decisions with learned, user-specific preferences, while acknowledging limitations in reward expressiveness and data availability.

Abstract

Accelerated development of demand response service provision by the residential sector is crucial for reducing carbon-emissions in the power sector. Along with the infrastructure advancement, encouraging the end users to participate is crucial. End users highly value their privacy and control, and want to be included in the service design and decision-making process when creating the daily appliance operation schedules. Furthermore, unless they are financially or environmentally motivated, they are generally not prepared to sacrifice their comfort to help balance the power system. In this paper, we present an inverse-reinforcement-learning-based model that helps create the end users' daily appliance schedules without asking them to explicitly state their needs and wishes. By using their past consumption data, the end consumers will implicitly participate in the creation of those decisions and will thus be motivated to continue participating in the provision of demand response services.

Learning a Reward Function for User-Preferred Appliance Scheduling

TL;DR

from observed behavior, the approach infers how users trade comfort against economic or environmental benefits, using historical consumption data and simulated DR data where necessary. The authors deploy a Deep Q-Network to generate day-level policies and evaluate generalization across days and multiple households, demonstrating that the learned reward can closely reproduce user-driven demand response in many cases, though accuracy declines as the number of controllable devices grows. This framework offers a privacy-conscious pathway to increase user participation in demand response by aligning scheduling decisions with learned, user-specific preferences, while acknowledging limitations in reward expressiveness and data availability.

Abstract

Paper Structure (9 sections, 12 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 9 sections, 12 equations, 3 figures, 2 tables, 1 algorithm.

Introduction
Demand Response as a Markov Decision Process
Learning User Preference with IRL
Input Data and Model Setup
Case Studies
Single Day
Adaptability to Different Days
Adaptability to Different Users
Conclusion

Figures (3)

Figure 1: Comparison of demand response provision obtained with true and learned reward evaluated with true reward.
Figure 2: Comparison of the devices' schedule with optimal and learned policy tested on August 8th
Figure 3: Distribution of MAEs between optimal and learned demand response provision for 5 households over the test set

Learning a Reward Function for User-Preferred Appliance Scheduling

TL;DR

Abstract

Learning a Reward Function for User-Preferred Appliance Scheduling

Authors

TL;DR

Abstract

Table of Contents

Figures (3)