Genetic Drift Regularization: on preventing Actor Injection from breaking Evolution Strategies

Paul Templier; Emmanuel Rachelson; Antoine Cully; Dennis G. Wilson

Genetic Drift Regularization: on preventing Actor Injection from breaking Evolution Strategies

Paul Templier, Emmanuel Rachelson, Antoine Cully, Dennis G. Wilson

TL;DR

Genetic Drift Regularization addresses the issue that injecting an RL actor into Evolution Strategies can cause genetic drift and degrade ES performance. It introduces a lightweight regularization that keeps the actor genome $\theta_A$ close to the ES center $\theta_{ES}$, with a Lagrangian form (GDR) and a squared-distance variant (SGDR). Across Brax continuous-control tasks, GDR reduces drift, maintains or improves ES convergence, and enhances RL learning when injection would otherwise hurt performance, outperforming some prior injection baselines. The method is simple to implement, computationally cheap, and broadly applicable to ES+RL hybrids, with potential extensions to trust-region formulations and landscape-alignment approaches.

Abstract

Evolutionary Algorithms (EA) have been successfully used for the optimization of neural networks for policy search, but they still remain sample inefficient and underperforming in some cases compared to gradient-based reinforcement learning (RL). Various methods combine the two approaches, many of them training a RL algorithm on data from EA evaluations and injecting the RL actor into the EA population. However, when using Evolution Strategies (ES) as the EA, the RL actor can drift genetically far from the the ES distribution and injection can cause a collapse of the ES performance. Here, we highlight the phenomenon of genetic drift where the actor genome and the ES population distribution progressively drift apart, leading to injection having a negative impact on the ES. We introduce Genetic Drift Regularization (GDR), a simple regularization method in the actor training loss that prevents the actor genome from drifting away from the ES. We show that GDR can improve ES convergence on problems where RL learns well, but also helps RL training on other tasks, , fixes the injection issues better than previous controlled injection methods.

Genetic Drift Regularization: on preventing Actor Injection from breaking Evolution Strategies

TL;DR

close to the ES center

, with a Lagrangian form (GDR) and a squared-distance variant (SGDR). Across Brax continuous-control tasks, GDR reduces drift, maintains or improves ES convergence, and enhances RL learning when injection would otherwise hurt performance, outperforming some prior injection baselines. The method is simple to implement, computationally cheap, and broadly applicable to ES+RL hybrids, with potential extensions to trust-region formulations and landscape-alignment approaches.

Abstract

Paper Structure (30 sections, 6 equations, 7 tables, 2 algorithms)

This paper contains 30 sections, 6 equations, 7 tables, 2 algorithms.

Introduction
Evolution Strategies for Policy Search
Evolutionary Reinforcement Learning
Actor injection
Genetic Drift
Detrimental genetic diversity
Measuring Genetic Drift in ES
Genetic drift mitigation
Genetic Drift Regularization
Lagrangian formulation
Lagrangian relaxation
Squared distance
Experiments and Results
Control tasks
Algorithms
...and 15 more sections

Genetic Drift Regularization: on preventing Actor Injection from breaking Evolution Strategies

TL;DR

Abstract

Genetic Drift Regularization: on preventing Actor Injection from breaking Evolution Strategies

Authors

TL;DR

Abstract

Table of Contents