A Meta-Learning Approach for Multi-Objective Reinforcement Learning in Sustainable Home Environments
Junlin Lu, Patrick Mannion, Karl Mason
TL;DR
This work tackles multi-objective residential energy scheduling under non-stationary renewable generation by integrating meta-learning with MORL. It extends GPI-LS/PD via Reptile-based meta-training to produce initial parameters that enable rapid, few-shot adaptation, and employs an autoencoder-based unsupervised method to detect context shifts. The authors validate their approach on a London residential energy environment, demonstrating substantial gains in expected utility, cost, and user comfort with dramatically reduced training data and steps, while improving solution density. The study also provides an open-source MORL residential energy environment and a detailed ablation analysis, underscoring the practicality and efficiency of meta-learning for energy management in dynamic settings.
Abstract
Effective residential appliance scheduling is crucial for sustainable living. While multi-objective reinforcement learning (MORL) has proven effective in balancing user preferences in appliance scheduling, traditional MORL struggles with limited data in non-stationary residential settings characterized by renewable generation variations. Significant context shifts that can invalidate previously learned policies. To address these challenges, we extend state-of-the-art MORL algorithms with the meta-learning paradigm, enabling rapid, few-shot adaptation to shifting contexts. Additionally, we employ an auto-encoder (AE)-based unsupervised method to detect environment context changes. We have also developed a residential energy environment to evaluate our method using real-world data from London residential settings. This study not only assesses the application of MORL in residential appliance scheduling but also underscores the effectiveness of meta-learning in energy management. Our top-performing method significantly surpasses the best baseline, while the trained model saves 3.28% on electricity bills, a 2.74% increase in user comfort, and a 5.9% improvement in expected utility. Additionally, it reduces the sparsity of solutions by 62.44%. Remarkably, these gains were accomplished using 96.71% less training data and 61.1% fewer training steps.
