Table of Contents
Fetching ...

Coreset-Based Task Selection for Sample-Efficient Meta-Reinforcement Learning

Donglin Zhan, Leonardo F. Toso, James Anderson

TL;DR

This paper tackles sample inefficiency in model-agnostic meta-RL by introducing a derivative-free, coreset-based task selection scheme that picks a weighted subset of tasks to maximize diversity in gradient space. It grounds the method in submodular optimization, provides a rigorous ergodic convergence analysis for non-concave task rewards, and demonstrates substantial sample-efficiency gains both in general MAML-RL and in the MAML-LQR control setting, where a gradient-dominance assumption yields a logarithmic or better convergence rate. The approach yields a provable reduction in training samples required to reach an $ε$-near stationary solution, corroborated by numerical experiments on deep RL benchmarks and LQR variants. Overall, task selection emerges as a key lever for scalable, fast-adapting meta-RL systems with practical impact in robotics and control.

Abstract

We study task selection to enhance sample efficiency in model-agnostic meta-reinforcement learning (MAML-RL). Traditional meta-RL typically assumes that all available tasks are equally important, which can lead to task redundancy when they share significant similarities. To address this, we propose a coreset-based task selection approach that selects a weighted subset of tasks based on how diverse they are in gradient space, prioritizing the most informative and diverse tasks. Such task selection reduces the number of samples needed to find an $ε$-close stationary solution by a factor of O(1/$ε$). Consequently, it guarantees a faster adaptation to unseen tasks while focusing training on the most relevant tasks. As a case study, we incorporate task selection to MAML-LQR (Toso et al., 2024b), and prove a sample complexity reduction proportional to O(log(1/$ε$)) when the task specific cost also satisfy gradient dominance. Our theoretical guarantees underscore task selection as a key component for scalable and sample-efficient meta-RL. We numerically validate this trend across multiple RL benchmark problems, illustrating the benefits of task selection beyond the LQR baseline.

Coreset-Based Task Selection for Sample-Efficient Meta-Reinforcement Learning

TL;DR

This paper tackles sample inefficiency in model-agnostic meta-RL by introducing a derivative-free, coreset-based task selection scheme that picks a weighted subset of tasks to maximize diversity in gradient space. It grounds the method in submodular optimization, provides a rigorous ergodic convergence analysis for non-concave task rewards, and demonstrates substantial sample-efficiency gains both in general MAML-RL and in the MAML-LQR control setting, where a gradient-dominance assumption yields a logarithmic or better convergence rate. The approach yields a provable reduction in training samples required to reach an -near stationary solution, corroborated by numerical experiments on deep RL benchmarks and LQR variants. Overall, task selection emerges as a key lever for scalable, fast-adapting meta-RL systems with practical impact in robotics and control.

Abstract

We study task selection to enhance sample efficiency in model-agnostic meta-reinforcement learning (MAML-RL). Traditional meta-RL typically assumes that all available tasks are equally important, which can lead to task redundancy when they share significant similarities. To address this, we propose a coreset-based task selection approach that selects a weighted subset of tasks based on how diverse they are in gradient space, prioritizing the most informative and diverse tasks. Such task selection reduces the number of samples needed to find an -close stationary solution by a factor of O(1/). Consequently, it guarantees a faster adaptation to unseen tasks while focusing training on the most relevant tasks. As a case study, we incorporate task selection to MAML-LQR (Toso et al., 2024b), and prove a sample complexity reduction proportional to O(log(1/)) when the task specific cost also satisfy gradient dominance. Our theoretical guarantees underscore task selection as a key component for scalable and sample-efficient meta-RL. We numerically validate this trend across multiple RL benchmark problems, illustrating the benefits of task selection beyond the LQR baseline.

Paper Structure

This paper contains 20 sections, 8 theorems, 72 equations, 3 figures, 1 algorithm.

Key Result

Theorem 1

(Stationary solution) Suppose Assumptions assump: Lipschitz of the MAML task specific, assump: uniform bound task specific grad and assumption: submodular function upper bound are satisfied. In addition, suppose the number of samples and smoothing radius are set according to with $\sigma^2 := \left(d\beta J_{\max}\right)^2 + \left(\epsilon + \phi\right)^2$ and $b := d\beta J_{\max} + \epsilon

Figures (3)

  • Figure 1: Comparison of coreset MAML-RL (this work) and MAML-RL on the walker2D Mujoco environment (see Section \ref{['sec:numerics']} for details).
  • Figure 2: Reward comparison of Algorithm \ref{['alg:metatrain']} and vanilla MAML finn2017model on Cart Pole (left), Hopper (middle), and Walker2D (right) tasks (Mujoco).
  • Figure 3: Optimality gap of Algorithm \ref{['alg:metatrain']} in the MAML-LQR setting with respect to iterations and number of samples.

Theorems & Definitions (12)

  • Definition 1: Submodularity nemhauser1978analysis
  • Definition 2
  • Theorem 1
  • Corollary 1
  • Definition 3
  • Lemma 1: Lemma 4 from toso2024meta
  • Lemma 2
  • Theorem 2
  • Corollary 2
  • Lemma 3: tropp2012user
  • ...and 2 more