Evaluating Model-Agnostic Meta-Learning on MetaWorld ML10 Benchmark: Fast Adaptation in Robotic Manipulation Tasks
Sanjar Atamuradov
TL;DR
The paper assesses Model-Agnostic Meta-Learning paired with Trust Region Policy Optimization (MAML-TRPO) on the MetaWorld ML10 robotic manipulation benchmark to test cross-task generalization and rapid adaptation. By training on eight ML10 tasks and evaluating on five held-out tasks, the study demonstrates meaningful one-shot adaptation after a single gradient step, but identifies a generalization gap as training performance continues to improve while test performance plateaus. Task-level analysis reveals high variance in adaptation success across manipulation skills, with some tasks learned well and others failing to transfer, especially for complex, sequential behaviors. The work highlights the promise of gradient-based meta-learning for robotics while underscoring limitations in diversity handling, long-horizon control, and cross-task robustness, and suggests avenues including task-aware adaptation, structured policy architectures, and hybrid meta-learning approaches for future research. Overall, the framework and findings establish a baseline for systematic evaluation of meta-RL in diverse robotic benchmarks and motivate real-world validation and methodological extensions.
Abstract
Meta-learning algorithms enable rapid adaptation to new tasks with minimal data, a critical capability for real-world robotic systems. This paper evaluates Model-Agnostic Meta-Learning (MAML) combined with Trust Region Policy Optimization (TRPO) on the MetaWorld ML10 benchmark, a challenging suite of ten diverse robotic manipulation tasks. We implement and analyze MAML-TRPO's ability to learn a universal initialization that facilitates few-shot adaptation across semantically different manipulation behaviors including pushing, picking, and drawer manipulation. Our experiments demonstrate that MAML achieves effective one-shot adaptation with clear performance improvements after a single gradient update, reaching final success rates of 21.0% on training tasks and 13.2% on held-out test tasks. However, we observe a generalization gap that emerges during meta-training, where performance on test tasks plateaus while training task performance continues to improve. Task-level analysis reveals high variance in adaptation effectiveness, with success rates ranging from 0% to 80% across different manipulation skills. These findings highlight both the promise and current limitations of gradient-based meta-learning for diverse robotic manipulation, and suggest directions for future work in task-aware adaptation and structured policy architectures.
