Table of Contents
Fetching ...

When Does Neuroevolution Outcompete Reinforcement Learning in Transfer Learning Tasks?

Eleni Nisioti, Joachim Winther Pedersen, Erwan Plantec, Milton L. Montero, Sebastian Risi

TL;DR

The paper investigates when neuroevolution (NE) can outperform reinforcement learning (RL) in transfer learning by introducing two curricula-based benchmarks, Stepping gates and Ecorobot, and evaluating a spectrum of NE and RL methods. It systematically compares direct (NEAT) and indirect (HyperNEAT) encodings, as well as diversity-driven (MAP-Elites) and gradient-free optimizers (CMA-ES) against PPO baselines, revealing that direct encodings generally transfer better across tasks, with NEAT often matching or surpassing CMA-ES, while indirect encodings excel at escaping local optima but struggle with skill transfer. The results highlight that curriculum structure and task complexity, especially with evolving morphologies, critically shape transfer performance, and that no single method yet solves both high-level transfer and complex locomotion. The study suggests hybrid encoding strategies that combine the strengths of direct and indirect mappings and calls for scalable benchmarks to push NE toward real-world applicability.

Abstract

The ability to continuously and efficiently transfer skills across tasks is a hallmark of biological intelligence and a long-standing goal in artificial systems. Reinforcement learning (RL), a dominant paradigm for learning in high-dimensional control tasks, is known to suffer from brittleness to task variations and catastrophic forgetting. Neuroevolution (NE) has recently gained attention for its robustness, scalability, and capacity to escape local optima. In this paper, we investigate an understudied dimension of NE: its transfer learning capabilities. To this end, we introduce two benchmarks: a) in stepping gates, neural networks are tasked with emulating logic circuits, with designs that emphasize modular repetition and variation b) ecorobot extends the Brax physics engine with objects such as walls and obstacles and the ability to easily switch between different robotic morphologies. Crucial in both benchmarks is the presence of a curriculum that enables evaluating skill transfer across tasks of increasing complexity. Our empirical analysis shows that NE methods vary in their transfer abilities and frequently outperform RL baselines. Our findings support the potential of NE as a foundation for building more adaptable agents and highlight future challenges for scaling NE to complex, real-world problems.

When Does Neuroevolution Outcompete Reinforcement Learning in Transfer Learning Tasks?

TL;DR

The paper investigates when neuroevolution (NE) can outperform reinforcement learning (RL) in transfer learning by introducing two curricula-based benchmarks, Stepping gates and Ecorobot, and evaluating a spectrum of NE and RL methods. It systematically compares direct (NEAT) and indirect (HyperNEAT) encodings, as well as diversity-driven (MAP-Elites) and gradient-free optimizers (CMA-ES) against PPO baselines, revealing that direct encodings generally transfer better across tasks, with NEAT often matching or surpassing CMA-ES, while indirect encodings excel at escaping local optima but struggle with skill transfer. The results highlight that curriculum structure and task complexity, especially with evolving morphologies, critically shape transfer performance, and that no single method yet solves both high-level transfer and complex locomotion. The study suggests hybrid encoding strategies that combine the strengths of direct and indirect mappings and calls for scalable benchmarks to push NE toward real-world applicability.

Abstract

The ability to continuously and efficiently transfer skills across tasks is a hallmark of biological intelligence and a long-standing goal in artificial systems. Reinforcement learning (RL), a dominant paradigm for learning in high-dimensional control tasks, is known to suffer from brittleness to task variations and catastrophic forgetting. Neuroevolution (NE) has recently gained attention for its robustness, scalability, and capacity to escape local optima. In this paper, we investigate an understudied dimension of NE: its transfer learning capabilities. To this end, we introduce two benchmarks: a) in stepping gates, neural networks are tasked with emulating logic circuits, with designs that emphasize modular repetition and variation b) ecorobot extends the Brax physics engine with objects such as walls and obstacles and the ability to easily switch between different robotic morphologies. Crucial in both benchmarks is the presence of a curriculum that enables evaluating skill transfer across tasks of increasing complexity. Our empirical analysis shows that NE methods vary in their transfer abilities and frequently outperform RL baselines. Our findings support the potential of NE as a foundation for building more adaptable agents and highlight future challenges for scaling NE to complex, real-world problems.

Paper Structure

This paper contains 27 sections, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Illustration of the stepping gates benchmark. (top) N-parity is a task with N input bits and 1 output bit where the first level starts with two input bits and each next level adds an extra bit. In this example, with a task of 3 bits at level 1, a full episode will go through all 8 combinations of the 3 active bits. Inactive bits are set to 0 and the optimal action for each step is depicted inside the red rectangle. On the left we illustrate a solution to the 3-parity problem that leverages a solution to 2-parity. (bottom) The Simple ALU task has 4 inputs, 4 output, and 4 control bits. The episode length and number of active inputs depend on the current level. At level 1, the task requires implementing a multiplexer and a NAND gate. A single control bit is used to switch between the two.
  • Figure 2: Illustration of the ecorobot benchmark. An environment consists of a robot and a task. (left) We currently support different Brax robots and a simpler robot, SimpleRob. Robots can be equipped with rangefinders and pie-slice sensors. (right) A task consists in a choice of objects and a reward function. We have implemented tasks that test for different behavioral challenges, with the Stepping stones maze and Hierarchical obstacles tasks being specifically designed to test for transfer learning.
  • Figure 3: Success in the stepping-gates benchmark with levels of increasing difficulty: (top) N-parity (bottom) Simple ALU
  • Figure 4: Ablation for the N-parity task. When directly solving 6-parity (without going through the intermediate levels) PPO solves the task while NEAT's performance degrades slightly.
  • Figure 5: Comparing the ability of methods to progress through stepping stones in the maze NEAT and CMA-ES perform best, PPO reaches only the first stepping stone while MAP-elites collects low rewards due to reaching the end of the maze without collecting the stones
  • ...and 6 more figures