Meta-RL with Shared Representations Enables Fast Adaptation in Energy Systems

Théo Zangato; Aomar Osmani; Pegah Alizadeh

Meta-RL with Shared Representations Enables Fast Adaptation in Energy Systems

Théo Zangato, Aomar Osmani, Pegah Alizadeh

TL;DR

This work introduces a novel Meta-RL framework that integrates a bi-level optimization scheme with a hybrid actor-critic architecture specially designed to enhance sample efficiency and inter-task adaptability, and meta-learn a shared state feature extractor jointly optimized across actor and critic networks.

Abstract

Meta-Reinforcement Learning addresses the critical limitations of conventional Reinforcement Learning in multi-task and non-stationary environments by enabling fast policy adaptation and improved generalization. We introduce a novel Meta-RL framework that integrates a bi-level optimization scheme with a hybrid actor-critic architecture specially designed to enhance sample efficiency and inter-task adaptability. To improve knowledge transfer, we meta-learn a shared state feature extractor jointly optimized across actor and critic networks, providing efficient representation learning and limiting overfitting to individual tasks or dominant profiles. Additionally, we propose a parameter-sharing mechanism between the outer- and inner-loop actor networks, to reduce redundant learning and accelerate adaptation during task revisitation. The approach is validated on a real-world Building Energy Management Systems dataset covering nearly a decade of temporal and structural variability, for which we propose a task preparation method to promote generalization. Experiments demonstrate effective task adaptation and better performance compared to conventional RL and Meta-RL methods.

Meta-RL with Shared Representations Enables Fast Adaptation in Energy Systems

TL;DR

Abstract

Paper Structure (12 sections, 7 equations, 5 figures, 1 table)

This paper contains 12 sections, 7 equations, 5 figures, 1 table.

Introduction
Proposed approach
Preliminaries
Critic Feature Extractor Meta Learning (CFE)
Adaptation Stage
Meta-Training
Inner Loop Actor Weights Reuse (AR)
Meta-RL Task Selection and Evaluation Protocol
Experimental Evaluation
Implementation and Training
Experiments and Results
Conclusion

Figures (5)

Figure 1: Model architecture. Red arrows show interactions between the inner and outer loops where each task knowledge is propagated to the meta-model, dashed arrows indicate gradient flow.
Figure 2: Behavioral clustering results.
Figure 3: Meta-testing initialization impact on unseen task (mean of 5 runs). The red dashed lines in (a) indicate the end of early meta gains shwon in (b).
Figure 4: Left: Variance of meta-trained agents across runs. Right: Ablation of Feature Extractors (FE: MLP or TS) and Actor Reuse (AR).
Figure 5: Evolution of the meta-gradient norm across training epochs for the standard Reptile algorithm and the proposed CFE variant.

Meta-RL with Shared Representations Enables Fast Adaptation in Energy Systems

TL;DR

Abstract

Meta-RL with Shared Representations Enables Fast Adaptation in Energy Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (5)