Table of Contents
Fetching ...

Meta-RL with Shared Representations Enables Fast Adaptation in Energy Systems

Théo Zangato, Aomar Osmani, Pegah Alizadeh

TL;DR

This work introduces a novel Meta-RL framework that integrates a bi-level optimization scheme with a hybrid actor-critic architecture specially designed to enhance sample efficiency and inter-task adaptability, and meta-learn a shared state feature extractor jointly optimized across actor and critic networks.

Abstract

Meta-Reinforcement Learning addresses the critical limitations of conventional Reinforcement Learning in multi-task and non-stationary environments by enabling fast policy adaptation and improved generalization. We introduce a novel Meta-RL framework that integrates a bi-level optimization scheme with a hybrid actor-critic architecture specially designed to enhance sample efficiency and inter-task adaptability. To improve knowledge transfer, we meta-learn a shared state feature extractor jointly optimized across actor and critic networks, providing efficient representation learning and limiting overfitting to individual tasks or dominant profiles. Additionally, we propose a parameter-sharing mechanism between the outer- and inner-loop actor networks, to reduce redundant learning and accelerate adaptation during task revisitation. The approach is validated on a real-world Building Energy Management Systems dataset covering nearly a decade of temporal and structural variability, for which we propose a task preparation method to promote generalization. Experiments demonstrate effective task adaptation and better performance compared to conventional RL and Meta-RL methods.

Meta-RL with Shared Representations Enables Fast Adaptation in Energy Systems

TL;DR

This work introduces a novel Meta-RL framework that integrates a bi-level optimization scheme with a hybrid actor-critic architecture specially designed to enhance sample efficiency and inter-task adaptability, and meta-learn a shared state feature extractor jointly optimized across actor and critic networks.

Abstract

Meta-Reinforcement Learning addresses the critical limitations of conventional Reinforcement Learning in multi-task and non-stationary environments by enabling fast policy adaptation and improved generalization. We introduce a novel Meta-RL framework that integrates a bi-level optimization scheme with a hybrid actor-critic architecture specially designed to enhance sample efficiency and inter-task adaptability. To improve knowledge transfer, we meta-learn a shared state feature extractor jointly optimized across actor and critic networks, providing efficient representation learning and limiting overfitting to individual tasks or dominant profiles. Additionally, we propose a parameter-sharing mechanism between the outer- and inner-loop actor networks, to reduce redundant learning and accelerate adaptation during task revisitation. The approach is validated on a real-world Building Energy Management Systems dataset covering nearly a decade of temporal and structural variability, for which we propose a task preparation method to promote generalization. Experiments demonstrate effective task adaptation and better performance compared to conventional RL and Meta-RL methods.
Paper Structure (12 sections, 7 equations, 5 figures, 1 table)

This paper contains 12 sections, 7 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Model architecture. Red arrows show interactions between the inner and outer loops where each task knowledge is propagated to the meta-model, dashed arrows indicate gradient flow.
  • Figure 2: Behavioral clustering results.
  • Figure 3: Meta-testing initialization impact on unseen task (mean of 5 runs). The red dashed lines in (a) indicate the end of early meta gains shwon in (b).
  • Figure 4: Left: Variance of meta-trained agents across runs. Right: Ablation of Feature Extractors (FE: MLP or TS) and Actor Reuse (AR).
  • Figure 5: Evolution of the meta-gradient norm across training epochs for the standard Reptile algorithm and the proposed CFE variant.