Exploring the Performance-Reproducibility Trade-off in Quality-Diversity

Manon Flageat; Hannah Janmohamed; Bryan Lim; Antoine Cully

Exploring the Performance-Reproducibility Trade-off in Quality-Diversity

Manon Flageat, Hannah Janmohamed, Bryan Lim, Antoine Cully

TL;DR

This work addresses uncertainty in Quality-Diversity optimization by formalizing a performance-reproducibility trade-off and introducing the delta-parametrisation to express user preferences. It develops five UQD approaches (two a-priori weighted-sum, two a-priori delta-comparison, one a-posteriori MOQD) and demonstrates that explicitly accounting for reproducibility can improve archive quality across robotics tasks and extended benchmarks. The results show that the proposed methods yield higher Corrected QD-Score and robust reproducibility, even when preferences are specified after optimization. The study highlights the practical impact of balancing performance and reproducibility in uncertain domains and outlines directions for future work, including extending to fitness-reproducibility and refining a-posteriori adaptive strategies.

Abstract

Quality-Diversity (QD) algorithms have exhibited promising results across many domains and applications. However, uncertainty in fitness and behaviour estimations of solutions remains a major challenge when QD is used in complex real-world applications. While several approaches have been proposed to improve the performance in uncertain applications, many fail to address a key challenge: determining how to prioritise solutions that perform consistently under uncertainty, in other words, solutions that are reproducible. Most prior methods improve fitness and reproducibility jointly, ignoring the possibility that they could be contradictory objectives. For example, in robotics, solutions may reliably walk at 90% of the maximum velocity in uncertain environments, while solutions that walk faster are also more prone to falling over. As this is a trade-off, neither one of these two solutions is "better" than the other. Thus, algorithms cannot intrinsically select one solution over the other, but can only enforce given preferences over these two contradictory objectives. In this paper, we formalise this problem as the performance-reproducibility trade-off for uncertain QD. We propose four new a-priori QD algorithms that find optimal solutions for given preferences over the trade-offs. We also propose an a-posteriori QD algorithm for when these preferences cannot be defined in advance. Our results show that our approaches successfully find solutions that satisfy given preferences. Importantly, by simply accounting for this trade-off, our approaches perform better than existing uncertain QD methods. This suggests that considering the performance-reproducibility trade-off unlocks important stepping stones that are usually missed when only performance is optimised.

Exploring the Performance-Reproducibility Trade-off in Quality-Diversity

TL;DR

Abstract

Paper Structure (49 sections, 5 equations, 12 figures, 2 tables)

This paper contains 49 sections, 5 equations, 12 figures, 2 tables.

Introduction
Background and Related Work
Quality-Diversity
MAP-Elites
Multi-Objective QD
Solution Reproducibility
Emergence of reproducibility
Link to robustness
Quantifying reproducibility and performance
Uncertain Quality-Diversity
Fixed-sampling approaches
Adaptive-sampling approaches
Other UQD approaches
Problem definition
Method
...and 34 more sections

Figures (12)

Figure 1: Importance of reproducibility: A QD algorithm produces an archive of solutions that is then used to solve a downstream task (top). Solutions with low reproducibility fail to reproduce their behaviour when deployed thus failing to solve the downstream task (bottom).
Figure 2: Solution reproducibility: Illustration of the expected fitness and feature, and the fitness and feature reproducibilities of a solution.
Figure 3: $\delta$-parametrisation: we consider a new solution compared to an existing elite $e$. The x-axis represents the reproducibility of this new solution, and the y-axis its fitness. We place the reproducibility $r_e$ and fitness $f_e$ of the elite $e$ on these axes. Coloured areas indicate values for which the new solution replaces the elite $e$.
Figure 4: Robotic tasks results: (top) Corrected QD-Score, displaying the quality and diversity of the final archive, and (bottom) Reproducibility-Score, quantifying the reproducibility of the solutions in the final archive. For both metrics, higher score is better. The vertical lines show the median across $10$ replications, the boxes the quartiles, the whiskers $1.5$ times the interquartile range, and the dots represent outliers. Each plot is split (by horizontal lines) into two parts: fixed-sampling approaches (both baselines and proposed approaches) and adaptive-sampling (both baselines and proposed approaches).
Figure 5: Example reproducibilities: trajectories obtained by the same policy replicated $32$ times in the Ant environment, displaying the importance of reproducibility. We randomly sample $1$ of the $10$ seeds of each algorithm and $1$ feature value, we replicate $32$ times the corresponding solutions and plot the resulting $32$ trajectories. We also display the full archive as a background for each algorithm, and the target feature as a red cross. The larger the spread of the trajectories the lower the reproducibility of the solution.
...and 7 more figures

Exploring the Performance-Reproducibility Trade-off in Quality-Diversity

TL;DR

Abstract

Exploring the Performance-Reproducibility Trade-off in Quality-Diversity

Authors

TL;DR

Abstract

Table of Contents

Figures (12)