Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning

Maxence Faldor; Félix Chalumeau; Manon Flageat; Antoine Cully

Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning

Maxence Faldor, Félix Chalumeau, Manon Flageat, Antoine Cully

TL;DR

DCRL-MAP-Elites is introduced, an extension of DCG-MAP-Elites that utilizes the descriptor-conditioned actor as a generative model to produce diverse solutions, which are then injected into the offspring batch at each generation.

Abstract

A hallmark of intelligence is the ability to exhibit a wide range of effective behaviors. Inspired by this principle, Quality-Diversity algorithms, such as MAP-Elites, are evolutionary methods designed to generate a set of diverse and high-fitness solutions. However, as a genetic algorithm, MAP-Elites relies on random mutations, which can become inefficient in high-dimensional search spaces, thus limiting its scalability to more complex domains, such as learning to control agents directly from high-dimensional inputs. To address this limitation, advanced methods like PGA-MAP-Elites and DCG-MAP-Elites have been developed, which combine actor-critic techniques from Reinforcement Learning with MAP-Elites, significantly enhancing the performance and efficiency of Quality-Diversity algorithms in complex, high-dimensional tasks. While these methods have successfully leveraged the trained critic to guide more effective mutations, the potential of the trained actor remains underutilized in improving both the quality and diversity of the evolved population. In this work, we introduce DCRL-MAP-Elites, an extension of DCG-MAP-Elites that utilizes the descriptor-conditioned actor as a generative model to produce diverse solutions, which are then injected into the offspring batch at each generation. Additionally, we present an empirical analysis of the fitness and descriptor reproducibility of the solutions discovered by each algorithm. Finally, we present a second empirical analysis shedding light on the synergies between the different variations operators and explaining the performance improvement from PGA-MAP-Elites to DCRL-MAP-Elites.

Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning

TL;DR

Abstract

Paper Structure (48 sections, 8 equations, 12 figures, 7 tables, 16 algorithms)

This paper contains 48 sections, 8 equations, 12 figures, 7 tables, 16 algorithms.

Introduction
Background
Problem Statement
MAP-Elites
Deep Reinforcement Learning
PGA-MAP-Elites
Methods
Descriptor-Conditioned Critic
Descriptor-Conditioned Actor and Archive Distillation
Actor-Critic Training
Descriptor-Conditioned PG Variation
Descriptor-Conditioned Actor Injection
Experiments
Tasks
Main Results
...and 33 more sections

Figures (12)

Figure 1: Descriptor-Conditioned Actor Injection transforms the generally capable descriptor-conditioned actor $\pi_\phi(s \mid d)$ into a specialized, unconditioned policy $\pi_{\psi_d}(s)$. For a given descriptor $d$, it generates a policy $G_\phi(d) = \pi_{\psi_d}$ with an architecture matching the policies in the archive. This enables the injection of specialized versions of the actor into different niches of the solution space.
Figure 2: QD score, coverage and max fitness for DCRL-MAP-Elites, DCG-MAP-Elites and all baselines on all tasks. Each experiment is replicated 20 times with random seeds. The solid line is the median and the shaded area represents the first and third quartiles.
Figure 3: Ant Omni Archive at the end of training for all algorithms.
Figure 4: QD score, coverage and max fitness for DCRL-MAP-Elites, DCG-MAP-Elites and the ablations on all tasks. Each experiment is replicated 20 times with random seeds. The solid line is the median and the shaded area represents the first and third quartiles.
Figure 5: Expected QD score, expected distance to descriptor (lower is better) and expected max fitness for DCRL-MAP-Elites, the descriptor-conditioned policy and the baselines on all tasks. Each experiment is replicated 20 times with random seeds.
...and 7 more figures

Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning

TL;DR

Abstract

Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (12)