Directed-MAML: Meta Reinforcement Learning Algorithm with Task-directed Approximation
Yang Zhang, Huiwen Yan, Mushuang Liu
TL;DR
Directed-MAML tackles the high computational cost and convergence challenges of gradient-based meta-RL by introducing a task-directed pre-adaptation step that uses a representative medium task to approximate the effect of second-order gradients with a first-order update. By estimating the medium-environment parameters from the task distribution and performing a preliminary gradient step on trajectories from this task, the method guides subsequent meta-gradients toward a region near the global optimum. Empirical results across CartPole-v1, LunarLander-v2, and a two-vehicle crossing task show faster convergence and improved efficiency, with additional demonstrations that the task-directed idea benefits other MAML-based algorithms such as FOMAML and Meta-SGD. The approach is presented as model-agnostic and broadly applicable to gradient-based meta-learning, offering practical gains in training time and scalability for meta-RL systems.
Abstract
Model-Agnostic Meta-Learning (MAML) is a versatile meta-learning framework applicable to both supervised learning and reinforcement learning (RL). However, applying MAML to meta-reinforcement learning (meta-RL) presents notable challenges. First, MAML relies on second-order gradient computations, leading to significant computational and memory overhead. Second, the nested structure of optimization increases the problem's complexity, making convergence to a global optimum more challenging. To overcome these limitations, we propose Directed-MAML, a novel task-directed meta-RL algorithm. Before the second-order gradient step, Directed-MAML applies an additional first-order task-directed approximation to estimate the effect of second-order gradients, thereby accelerating convergence to the optimum and reducing computational cost. Experimental results demonstrate that Directed-MAML surpasses MAML-based baselines in computational efficiency and convergence speed in the scenarios of CartPole-v1, LunarLander-v2 and two-vehicle intersection crossing. Furthermore, we show that task-directed approximation can be effectively integrated into other meta-learning algorithms, such as First-Order Model-Agnostic Meta-Learning (FOMAML) and Meta Stochastic Gradient Descent(Meta-SGD), yielding improved computational efficiency and convergence speed.
