Table of Contents
Fetching ...

Few-shot Quality-Diversity Optimization

Achkan Salehi, Alexandre Coninx, Stephane Doncieux

TL;DR

The paper addresses the data-inefficiency of Quality-Diversity optimization in deceptive or sparse-reward environments by introducing FAERY, a gradient-free meta-learning framework that learns a prior population $\mathcal{P}$ from a task distribution $\mathcal{T}$. FAERY optimizes two meta-objectives, $f_0$ (polyvalence) and $f_1$ (adaptation speed), based on the evolution paths of multiple QD runs and updates $\mathcal{P}$ via Pareto optimization to enable rapid adaptation on unseen tasks $t_{new}$. This approach is model- and gradient-agnostic, preserving the flexibility of QD while substantially reducing the number of generations needed to reach solutions, in both sparse and dense reward settings. Experimental results across randomly generated mazes and Meta-World manipulation tasks demonstrate major time savings and improved transfer across tasks, highlighting FAERY’s potential for continual and multi-task learning in robotics.

Abstract

In the past few years, a considerable amount of research has been dedicated to the exploitation of previous learning experiences and the design of Few-shot and Meta Learning approaches, in problem domains ranging from Computer Vision to Reinforcement Learning based control. A notable exception, where to the best of our knowledge, little to no effort has been made in this direction is Quality-Diversity (QD) optimization. QD methods have been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning. However, they remain costly due to their reliance on inherently sample inefficient evolutionary processes. We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation. Our proposed method does not require backpropagation. It is simple to implement and scale, and furthermore, it is agnostic to the underlying models that are being trained. Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments.

Few-shot Quality-Diversity Optimization

TL;DR

The paper addresses the data-inefficiency of Quality-Diversity optimization in deceptive or sparse-reward environments by introducing FAERY, a gradient-free meta-learning framework that learns a prior population from a task distribution . FAERY optimizes two meta-objectives, (polyvalence) and (adaptation speed), based on the evolution paths of multiple QD runs and updates via Pareto optimization to enable rapid adaptation on unseen tasks . This approach is model- and gradient-agnostic, preserving the flexibility of QD while substantially reducing the number of generations needed to reach solutions, in both sparse and dense reward settings. Experimental results across randomly generated mazes and Meta-World manipulation tasks demonstrate major time savings and improved transfer across tasks, highlighting FAERY’s potential for continual and multi-task learning in robotics.

Abstract

In the past few years, a considerable amount of research has been dedicated to the exploitation of previous learning experiences and the design of Few-shot and Meta Learning approaches, in problem domains ranging from Computer Vision to Reinforcement Learning based control. A notable exception, where to the best of our knowledge, little to no effort has been made in this direction is Quality-Diversity (QD) optimization. QD methods have been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning. However, they remain costly due to their reliance on inherently sample inefficient evolutionary processes. We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation. Our proposed method does not require backpropagation. It is simple to implement and scale, and furthermore, it is agnostic to the underlying models that are being trained. Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments.

Paper Structure

This paper contains 11 sections, 4 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Example of an evolution forest associated to a single toy QD instance $Q_i$, initialized with the prior population $\{A, B, C\}$. Each edge indicates a parent-offspring relationship between two nodes. The $S_i$ are the solutions found by the QD algorithm. In this example, $f_0^i(A)=1, f_0^i(B)=2, f_0^i(C)=2$ and $f_1^i(A)=2, f_1^i(B)=3.5, f_1^i(C)=3$.
  • Figure 2: Examples of random mazes sampled from the maze distributions, and the results of the proposed method (second and third rows). The first and second columns respectively correspond to $8\times 8$ and $10\times 10$ mazes. The horizontal axis of the plots in the second and last rows indicates the number of generations for which the prior population has been optimized by FAERY (i.e. the number of meta-updates), which should not be confused with the number of generations needed for a single QD algorithm to converge. Notice that on this axis, generation $0$ corresponds to a QD optimization that is performed from scratch as no priors have been learned up to this point. Note that all train/test environments that are sampled at generation $i$ on this axis will have their corresponding QD instances initialized using the priors that are obtained from generation $i-1$. (Top row) Example mazes that are sampled from the $8 \times 8$ and $10 \times 10$ maze distributions. (Middle row) The ratio of environments among the M sampled ones that the QD instances are able to solve. (Bottom row) the average number of generations necessary to solve the tasks (that are solvable at a given generation).
  • Figure 3: Each of the two columns corresponds to an example task from metaworld. (Top row) Example frames from the basketball-v2 and assembly-v2 tasks from the benchmark. (Middle row) The ratio of environments among the $M$ sampled ones that the QD instances are able to solve at each generation of FAERY. For example, on the basketball-v2 task, we see that without optimized priors (generation 0 on the horizontal axis), only $60\%$ of the $M$ QD methods are successful. After about $100$ meta-updates to the prior population however, we see that that all QD methods successfully solve their environments. (Bottom row) The average number of generations necessary to solve tasks (that are solvable at a given generation). As in the maze experiments, the horizontal axis of the plots is the number of generations for which the prior population is optimized by FAERY (i.e. the number of meta-updates), which should not be confused with the number of generations needed for a single QD algorithm to converge. Note that all train/test environments that are sampled at generation $i$ on this axis will have their corresponding QD instances initialized using the priors that are obtained from generation $i-1$.
  • Figure 4: (a) The toy environment distribution used to demonstrate the complementarity between the two fitnesses. (b) Results of a run in which only adaptivity ($f_1$) is optimized. (c) results of a single run where only polyvalence ($f_0$) is maximized. (d) Results when both $f_0$ and $f_1$ are jointly maximized. See section \ref{['sec_ablation']} for details.