Table of Contents
Fetching ...

Evaluating Model-Agnostic Meta-Learning on MetaWorld ML10 Benchmark: Fast Adaptation in Robotic Manipulation Tasks

Sanjar Atamuradov

TL;DR

The paper assesses Model-Agnostic Meta-Learning paired with Trust Region Policy Optimization (MAML-TRPO) on the MetaWorld ML10 robotic manipulation benchmark to test cross-task generalization and rapid adaptation. By training on eight ML10 tasks and evaluating on five held-out tasks, the study demonstrates meaningful one-shot adaptation after a single gradient step, but identifies a generalization gap as training performance continues to improve while test performance plateaus. Task-level analysis reveals high variance in adaptation success across manipulation skills, with some tasks learned well and others failing to transfer, especially for complex, sequential behaviors. The work highlights the promise of gradient-based meta-learning for robotics while underscoring limitations in diversity handling, long-horizon control, and cross-task robustness, and suggests avenues including task-aware adaptation, structured policy architectures, and hybrid meta-learning approaches for future research. Overall, the framework and findings establish a baseline for systematic evaluation of meta-RL in diverse robotic benchmarks and motivate real-world validation and methodological extensions.

Abstract

Meta-learning algorithms enable rapid adaptation to new tasks with minimal data, a critical capability for real-world robotic systems. This paper evaluates Model-Agnostic Meta-Learning (MAML) combined with Trust Region Policy Optimization (TRPO) on the MetaWorld ML10 benchmark, a challenging suite of ten diverse robotic manipulation tasks. We implement and analyze MAML-TRPO's ability to learn a universal initialization that facilitates few-shot adaptation across semantically different manipulation behaviors including pushing, picking, and drawer manipulation. Our experiments demonstrate that MAML achieves effective one-shot adaptation with clear performance improvements after a single gradient update, reaching final success rates of 21.0% on training tasks and 13.2% on held-out test tasks. However, we observe a generalization gap that emerges during meta-training, where performance on test tasks plateaus while training task performance continues to improve. Task-level analysis reveals high variance in adaptation effectiveness, with success rates ranging from 0% to 80% across different manipulation skills. These findings highlight both the promise and current limitations of gradient-based meta-learning for diverse robotic manipulation, and suggest directions for future work in task-aware adaptation and structured policy architectures.

Evaluating Model-Agnostic Meta-Learning on MetaWorld ML10 Benchmark: Fast Adaptation in Robotic Manipulation Tasks

TL;DR

The paper assesses Model-Agnostic Meta-Learning paired with Trust Region Policy Optimization (MAML-TRPO) on the MetaWorld ML10 robotic manipulation benchmark to test cross-task generalization and rapid adaptation. By training on eight ML10 tasks and evaluating on five held-out tasks, the study demonstrates meaningful one-shot adaptation after a single gradient step, but identifies a generalization gap as training performance continues to improve while test performance plateaus. Task-level analysis reveals high variance in adaptation success across manipulation skills, with some tasks learned well and others failing to transfer, especially for complex, sequential behaviors. The work highlights the promise of gradient-based meta-learning for robotics while underscoring limitations in diversity handling, long-horizon control, and cross-task robustness, and suggests avenues including task-aware adaptation, structured policy architectures, and hybrid meta-learning approaches for future research. Overall, the framework and findings establish a baseline for systematic evaluation of meta-RL in diverse robotic benchmarks and motivate real-world validation and methodological extensions.

Abstract

Meta-learning algorithms enable rapid adaptation to new tasks with minimal data, a critical capability for real-world robotic systems. This paper evaluates Model-Agnostic Meta-Learning (MAML) combined with Trust Region Policy Optimization (TRPO) on the MetaWorld ML10 benchmark, a challenging suite of ten diverse robotic manipulation tasks. We implement and analyze MAML-TRPO's ability to learn a universal initialization that facilitates few-shot adaptation across semantically different manipulation behaviors including pushing, picking, and drawer manipulation. Our experiments demonstrate that MAML achieves effective one-shot adaptation with clear performance improvements after a single gradient update, reaching final success rates of 21.0% on training tasks and 13.2% on held-out test tasks. However, we observe a generalization gap that emerges during meta-training, where performance on test tasks plateaus while training task performance continues to improve. Task-level analysis reveals high variance in adaptation effectiveness, with success rates ranging from 0% to 80% across different manipulation skills. These findings highlight both the promise and current limitations of gradient-based meta-learning for diverse robotic manipulation, and suggest directions for future work in task-aware adaptation and structured policy architectures.

Paper Structure

This paper contains 23 sections, 3 equations, 5 figures.

Figures (5)

  • Figure 1: Policy loss before (red) and after (green) adaptation across 300 training iterations on MetaWorld ML10. Post-adaptation consistently improves performance, demonstrating successful few-shot adaptation from the learned initialization.
  • Figure 2: Average return over 1-3 gradient update steps for various test tasks. MAML (blue) compared to untrained baseline (orange). Most tasks show strong one-shot adaptation, but performance degrades with additional steps for some tasks. The figure contains four subplots showing results for: (top left) Lever-pull, (top right) Shelf-place, (bottom left) Door-close, and (bottom right) Drawer-open tasks.
  • Figure 3: Success rate (%) on training and test tasks over 300 meta-training iterations. Early generalization to test tasks is overtaken by specialization on training tasks. Final success rates: 21.0% (training) and 13.2% (test).
  • Figure 4: Final success rate (%) by training task for MAML-TRPO on MetaWorld ML10. High variance across tasks, with some mastered (door-open) and others showing minimal learning (window-open).
  • Figure 5: Final success rate (%) by test task for MAML-TRPO. Results reflect MAML's ability to generalize to unseen tasks based on training experience.