Table of Contents
Fetching ...

Overcoming Deceptiveness in Fitness Optimization with Unsupervised Quality-Diversity

Lisa Coiffard, Paul Templier, Antoine Cully

TL;DR

The paper tackles deceptive optimization where immediate improvements trap solutions by leveraging unsupervised quality-diversity. It uses AURORA to learn latent feature representations from trajectories and introduces AURORA-XCon, combining a contrastive learning objective with periodic extinction events to improve optimization without domain-specific features. Empirically, AURORA-XCon outperforms traditional baselines and matches or exceeds hand-crafted QD baselines across several robotics tasks, demonstrating the potential of unsupervised QD for broad domains. The work highlights the capacity of unsupervised QD to shift focus from novelty discovery to efficient optimization, though it notes task-dependent effects and computational trade-offs, outlining avenues for continual representation learning and adaptive extinction strategies.

Abstract

Policy optimization seeks the best solution to a control problem according to an objective or fitness function, serving as a fundamental field of engineering and research with applications in robotics. Traditional optimization methods like reinforcement learning and evolutionary algorithms struggle with deceptive fitness landscapes, where following immediate improvements leads to suboptimal solutions. Quality-diversity (QD) algorithms offer a promising approach by maintaining diverse intermediate solutions as stepping stones for escaping local optima. However, QD algorithms require domain expertise to define hand-crafted features, limiting their applicability where characterizing solution diversity remains unclear. In this paper, we show that unsupervised QD algorithms - specifically the AURORA framework, which learns features from sensory data - efficiently solve deceptive optimization problems without domain expertise. By enhancing AURORA with contrastive learning and periodic extinction events, we propose AURORA-XCon, which outperforms all traditional optimization baselines and matches, in some cases even improving by up to 34%, the best QD baseline with domain-specific hand-crafted features. This work establishes a novel application of unsupervised QD algorithms, shifting their focus from discovering novel solutions toward traditional optimization and expanding their potential to domains where defining feature spaces poses challenges.

Overcoming Deceptiveness in Fitness Optimization with Unsupervised Quality-Diversity

TL;DR

The paper tackles deceptive optimization where immediate improvements trap solutions by leveraging unsupervised quality-diversity. It uses AURORA to learn latent feature representations from trajectories and introduces AURORA-XCon, combining a contrastive learning objective with periodic extinction events to improve optimization without domain-specific features. Empirically, AURORA-XCon outperforms traditional baselines and matches or exceeds hand-crafted QD baselines across several robotics tasks, demonstrating the potential of unsupervised QD for broad domains. The work highlights the capacity of unsupervised QD to shift focus from novelty discovery to efficient optimization, though it notes task-dependent effects and computational trade-offs, outlining avenues for continual representation learning and adaptive extinction strategies.

Abstract

Policy optimization seeks the best solution to a control problem according to an objective or fitness function, serving as a fundamental field of engineering and research with applications in robotics. Traditional optimization methods like reinforcement learning and evolutionary algorithms struggle with deceptive fitness landscapes, where following immediate improvements leads to suboptimal solutions. Quality-diversity (QD) algorithms offer a promising approach by maintaining diverse intermediate solutions as stepping stones for escaping local optima. However, QD algorithms require domain expertise to define hand-crafted features, limiting their applicability where characterizing solution diversity remains unclear. In this paper, we show that unsupervised QD algorithms - specifically the AURORA framework, which learns features from sensory data - efficiently solve deceptive optimization problems without domain expertise. By enhancing AURORA with contrastive learning and periodic extinction events, we propose AURORA-XCon, which outperforms all traditional optimization baselines and matches, in some cases even improving by up to 34%, the best QD baseline with domain-specific hand-crafted features. This work establishes a novel application of unsupervised QD algorithms, shifting their focus from discovering novel solutions toward traditional optimization and expanding their potential to domains where defining feature spaces poses challenges.

Paper Structure

This paper contains 24 sections, 2 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of the AURORA-XCon algorithm with two key contributions: (a) encoder training with a contrastive objective (the triplet loss) and (b) periodic extinction events.
  • Figure 2: Overview of the environments used in our experiments. From left to right: AntMaze, HalfCheetah, Walker and Kheperax standard maze.
  • Figure 3: Maximum fitness tracked over 1 million evaluations. We show non-PGA-variants for Kheperax and PGA-variants for all Brax tasks. We plot the median (solid line) and interquartile range (IQR, shaded area).
  • Figure 4: Maximum fitness tracked over 1 million evaluations. We show non-PGA-variants for Kheperax and PGA-variants for all Brax tasks. We plot the median (solid line) and IQR (shaded area).
  • Figure 5: Final performance after 1 million evaluations, showing median and IQR. Results display evaluations-to-goal for Kheperax (non-PGA-variants) and maximum fitness for Brax tasks (PGA-variants).
  • ...and 1 more figures