Table of Contents
Fetching ...

Surrogate-Assisted Evolutionary Reinforcement Learning Based on Autoencoder and Hyperbolic Neural Network

Bingdong Li, Mei Jiang, Hong Qian, Ke Tang, Aimin Zhou, Peng Yang

TL;DR

This paper tackles the high computational cost of evolutionary reinforcement learning in high-dimensional policy spaces by introducing a learnable surrogate-assisted framework (AE-HNN-NCS). It combines an Autoencoder for adaptive policy embedding with a Hyperbolic Neural Network surrogate to perform ranking-based pre-selection, reducing real environment evaluations while preserving search efficacy. Empirical results on 10 Atari games and 4 MuJoCo tasks show that AE-HNN-NCS outperforms baselines and state-of-the-art ERL methods, with faster wall-clock training due to reduced evaluations and more structured exploration trajectories. The approach offers a scalable, end-to-end learnable solution to the curse of dimensionality in ERL and points to future enhancements in autoencoder variants, regression surrogates, and diversification strategies.

Abstract

Evolutionary Reinforcement Learning (ERL), training the Reinforcement Learning (RL) policies with Evolutionary Algorithms (EAs), have demonstrated enhanced exploration capabilities and greater robustness than using traditional policy gradient. However, ERL suffers from the high computational costs and low search efficiency, as EAs require evaluating numerous candidate policies with expensive simulations, many of which are ineffective and do not contribute meaningfully to the training. One intuitive way to reduce the ineffective evaluations is to adopt the surrogates. Unfortunately, existing ERL policies are often modeled as deep neural networks (DNNs) and thus naturally represented as high-dimensional vectors containing millions of weights, which makes the building of effective surrogates for ERL policies extremely challenging. This paper proposes a novel surrogate-assisted ERL that integrates Autoencoders (AE) and Hyperbolic Neural Networks (HNN). Specifically, AE compresses high-dimensional policies into low-dimensional representations while extracting key features as the inputs for the surrogate. HNN, functioning as a classification-based surrogate model, can learn complex nonlinear relationships from sampled data and enable more accurate pre-selection of the sampled policies without real evaluations. The experiments on 10 Atari and 4 Mujoco games have verified that the proposed method outperforms previous approaches significantly. The search trajectories guided by AE and HNN are also visually demonstrated to be more effective, in terms of both exploration and convergence. This paper not only presents the first learnable policy embedding and surrogate-modeling modules for high-dimensional ERL policies, but also empirically reveals when and why they can be successful.

Surrogate-Assisted Evolutionary Reinforcement Learning Based on Autoencoder and Hyperbolic Neural Network

TL;DR

This paper tackles the high computational cost of evolutionary reinforcement learning in high-dimensional policy spaces by introducing a learnable surrogate-assisted framework (AE-HNN-NCS). It combines an Autoencoder for adaptive policy embedding with a Hyperbolic Neural Network surrogate to perform ranking-based pre-selection, reducing real environment evaluations while preserving search efficacy. Empirical results on 10 Atari games and 4 MuJoCo tasks show that AE-HNN-NCS outperforms baselines and state-of-the-art ERL methods, with faster wall-clock training due to reduced evaluations and more structured exploration trajectories. The approach offers a scalable, end-to-end learnable solution to the curse of dimensionality in ERL and points to future enhancements in autoencoder variants, regression surrogates, and diversification strategies.

Abstract

Evolutionary Reinforcement Learning (ERL), training the Reinforcement Learning (RL) policies with Evolutionary Algorithms (EAs), have demonstrated enhanced exploration capabilities and greater robustness than using traditional policy gradient. However, ERL suffers from the high computational costs and low search efficiency, as EAs require evaluating numerous candidate policies with expensive simulations, many of which are ineffective and do not contribute meaningfully to the training. One intuitive way to reduce the ineffective evaluations is to adopt the surrogates. Unfortunately, existing ERL policies are often modeled as deep neural networks (DNNs) and thus naturally represented as high-dimensional vectors containing millions of weights, which makes the building of effective surrogates for ERL policies extremely challenging. This paper proposes a novel surrogate-assisted ERL that integrates Autoencoders (AE) and Hyperbolic Neural Networks (HNN). Specifically, AE compresses high-dimensional policies into low-dimensional representations while extracting key features as the inputs for the surrogate. HNN, functioning as a classification-based surrogate model, can learn complex nonlinear relationships from sampled data and enable more accurate pre-selection of the sampled policies without real evaluations. The experiments on 10 Atari and 4 Mujoco games have verified that the proposed method outperforms previous approaches significantly. The search trajectories guided by AE and HNN are also visually demonstrated to be more effective, in terms of both exploration and convergence. This paper not only presents the first learnable policy embedding and surrogate-modeling modules for high-dimensional ERL policies, but also empirically reveals when and why they can be successful.

Paper Structure

This paper contains 23 sections, 7 equations, 6 figures, 3 tables, 2 algorithms.

Figures (6)

  • Figure 1: The proposed ERL framework integrates an Autoencoder and a Hyperbolic Neural Network for enhanced dimensionality reduction and policy pre-selection. The left part shows the traditional ERL workflow with EA and RL phases. Our algorithm innovates in the EA phase. We first use an Autoencoder to embed offspring policies to a lower-dimensional space. Then, the Hyperbolic Neural Network acts as a surrogate model, predicting the rank of policies' qualities. This enables efficient pre-selections, reducing unnecessary evaluations. After pre-selection, the selected policies are mapped to the high-dimensional space for real fitness evaluations, and then are incorporated into the evolutionary search process.
  • Figure 2: The overview of autoencoder.
  • Figure 3: Performance Analysis of ERL with NCS and AE-HNN-NCS in Different Environments. Figures (a)and (b) show the comparison between ERL with NCS and AE-HNN-NCS in terms of time cost and final policy reward.
  • Figure 4: The ranking consistency of final performances among three strategies—(1) high-dimensional space searching with low-dimensional space pre-selection, (2) low-dimensional space searching with low-dimensional space pre-selection, and (3) low-dimensional space searching and high-dimensional pre-selection.
  • Figure 5: The t-SNE visualization results of HNN-based policy pre-selection on three Atari games: Alien, Pong, and Freeway. Each point represents a policy embedding in hyperbolic space; color indicates HNN-predicted score (dark blue = promising); dashed ring marks the high-confidence region used for evaluation.
  • ...and 1 more figures