Constrained Meta Agnostic Reinforcement Learning

Karam Daaboul; Florian Kuhm; Tim Joseph; J. Marius Zoellner

Constrained Meta Agnostic Reinforcement Learning

Karam Daaboul, Florian Kuhm, Tim Joseph, J. Marius Zoellner

TL;DR

C-MAML enables rapid and efficient task adaptation by incorporating task-specific constraints directly into its meta-algorithm framework during the training phase, which results in safer initial parameters for learning new tasks.

Abstract

Meta-Reinforcement Learning (Meta-RL) aims to acquire meta-knowledge for quick adaptation to diverse tasks. However, applying these policies in real-world environments presents a significant challenge in balancing rapid adaptability with adherence to environmental constraints. Our novel approach, Constraint Model Agnostic Meta Learning (C-MAML), merges meta learning with constrained optimization to address this challenge. C-MAML enables rapid and efficient task adaptation by incorporating task-specific constraints directly into its meta-algorithm framework during the training phase. This fusion results in safer initial parameters for learning new tasks. We demonstrate the effectiveness of C-MAML in simulated locomotion with wheeled robot tasks of varying complexity, highlighting its practicality and robustness in dynamic environments.

Constrained Meta Agnostic Reinforcement Learning

TL;DR

Abstract

Paper Structure (30 sections, 37 equations, 13 figures, 1 algorithm)

This paper contains 30 sections, 37 equations, 13 figures, 1 algorithm.

Introduction
Related Works
Meta Learning
Safe Reinforcement Learning
Preliminaries
Constrained Model Agnostic Meta Reinforcement Learning
Incorporating Constraints in the Inner Loop
Optimizing Meta-Parameter in the Outer Loop
Enhancing Safety of the Meta-Policy
Evaluation
Task Setup
Importance of Using $\eta$ and Safety Critic
Safety Adaptation Across Task Spectrum
Model Agnosticity in C-MAML Framework
Conclusion
...and 15 more sections

Figures (13)

Figure 1: Visual representation of the Constrained Model Agnostic Meta Learning (C-MAML) framework. This schematic showcases the iterative optimization process where the meta-policy is trained across different tasks. Task-specific policies ($\pi_1, \pi_2, \pi_3$) are adjusted within their respective constraint surfaces $C_1, C_2, C_3$, each with a dedicated safety boundary $d_1, d_2, d_3$.
Figure 2: Illustrations of the action space and two different tasks of the used environment.
Figure 3: Evaluation of $\eta$ on policy safety and adaptability: On the left, meta-training performance across 106 tasks, showing the effect of an adaptive $\eta$ (employing safety critic) versus $\eta = 0$ (no safety critic) on maintaining safer cost margins. On the right, fine-tuning phase performance, illustrating how an adaptive $\eta$ contributes to consistent adherence to the $d=5$ cost threshold compared to the absence of a safety critic.
Figure 4: Mean episode return and costs during fine-tuning across tasks. Policies are color-coded as follows: C-MAML with TRPOLag in the inner loop is depicted in blue, the randomly initialized policy in orange, the TRPOLag pretrained policy in green, and the MAML policy with TRPO in the inner loop is shown in red, highlighting the diverse adaptation strategies explored.
Figure 5: Mean episode return and costs during fine-tuning across tasks: C-MAML with CPO in the inner loop is depicted in blue, the randomly initialized policy in orange, CPO pretrained in green, and the MAML policy with TRPO in the inner loop is shown in red. Each of these policies (initialization) is fine-tuned using CPO.
...and 8 more figures

Constrained Meta Agnostic Reinforcement Learning

TL;DR

Abstract

Constrained Meta Agnostic Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (13)