Parametric-Task MAP-Elites

Timothée Anne; Jean-Baptiste Mouret

Parametric-Task MAP-Elites

Timothée Anne, Jean-Baptiste Mouret

TL;DR

Parametric-Task MAP-Elites (PT-ME) tackles continuous multi-task optimization by learning a mapping $G: \Theta \rightarrow \mathcal{X}$ that returns the optimal solution $x^*_{\theta}$ for any task parameter $\theta$. It achieves this by sampling a new task each iteration, employing a dual variation strategy (SBX with a bandit-tuned tournament and a local linear-regression operator), and storing all evaluations for distillation into a neural network predictor. Across 10-DoF Arm, Archery, and Door-Pulling, PT-ME outperforms baselines including PPO, CMA-ES, MT-ME, and ablations in both coverage (MR-QD-Score) and solution quality, demonstrating scalable, data-efficient parametric-task learning. The approach enables fast inference on unseen tasks via distillation and suggests directions for scaling to larger task spaces and alternative regression models.

Abstract

Optimizing a set of functions simultaneously by leveraging their similarity is called multi-task optimization. Current black-box multi-task algorithms only solve a finite set of tasks, even when the tasks originate from a continuous space. In this paper, we introduce Parametric-Task MAP-Elites (PT-ME), a new black-box algorithm for continuous multi-task optimization problems. This algorithm (1) solves a new task at each iteration, effectively covering the continuous space, and (2) exploits a new variation operator based on local linear regression. The resulting dataset of solutions makes it possible to create a function that maps any task parameter to its optimal solution. We show that PT-ME outperforms all baselines, including the deep reinforcement learning algorithm PPO on two parametric-task toy problems and a robotic problem in simulation.

Parametric-Task MAP-Elites

TL;DR

Parametric-Task MAP-Elites (PT-ME) tackles continuous multi-task optimization by learning a mapping

that returns the optimal solution

for any task parameter

. It achieves this by sampling a new task each iteration, employing a dual variation strategy (SBX with a bandit-tuned tournament and a local linear-regression operator), and storing all evaluations for distillation into a neural network predictor. Across 10-DoF Arm, Archery, and Door-Pulling, PT-ME outperforms baselines including PPO, CMA-ES, MT-ME, and ablations in both coverage (MR-QD-Score) and solution quality, demonstrating scalable, data-efficient parametric-task learning. The approach enables fast inference on unseen tasks via distillation and suggests directions for scaling to larger task spaces and alternative regression models.

Abstract

Paper Structure (28 sections, 3 equations, 4 figures, 2 algorithms)

This paper contains 28 sections, 3 equations, 4 figures, 2 algorithms.

Introduction
Problem formulation
Related Work
Parametric Programming
Multi-Task Optimization
Multi-Task MAP-Elites
Reinforcement Learning
Method
The Archive of Elites
Variation Operator 1: SBX with tournament
Variation Operator 2: Linear Regression
PT-ME Algorithm
Distillation
Experiments
Considered Domains
...and 13 more sections

Figures (4)

Figure 1: (a) Schematic view of the 10-DoF Arm problem. (b) Archive examples extracted from one run. (c) QD-Score for different resolutions (line $=$ median of 20 replications and shaded area $=$ first and third quantiles). (d) Multi-Resolution QD-Score.
Figure 2: (a) Schematic view of the Archery problem. (b) Archive examples extracted from one run. (c) QD-Score for different resolutions (line $=$ median of 20 replications and shaded area $=$ first and third quantiles). (d) Multi-Resolution QD-Score.
Figure 3: (a) Door-Pulling visualization. (b) Examples of the archive found by PT-ME and PPOPPO. (c) The QD-Score of the two methods for different archive resolutions (the line is the median of 10 replications, and the shaded area is between the first and third quantiles). (d) The Multi-Resolution QD-Score of the two methods for $10$ replications.
Figure 4: (a) Inference score, i.e., mean fitness over $10\,000$ ($1\,000$ for Door-Pulling) evenly spread new tasks of PT-ME distillation, MT-ME distillation, and PPO for the three parametric-task optimization problems (the line is the median of 20 (10 for Door-Pulling) replications, and the shaded area is between the first and third quantiles). (b) Box plots with PT-ME's best resolution ( $*$$*$$*$$=$$\text{p-value}$$<$$0.001$, $*$$*$$=$$\text{p-value}$$<$$0.01$).

Parametric-Task MAP-Elites

TL;DR

Abstract

Parametric-Task MAP-Elites

Authors

TL;DR

Abstract

Table of Contents

Figures (4)