Self-evolving Autoencoder Embedded Q-Network

J. Senthilnath; Bangjian Zhou; Zhen Wei Ng; Deeksha Aggarwal; Rajdeep Dutta; Ji Wei Yoon; Aye Phyu Phyu Aung; Keyu Wu; Min Wu; Xiaoli Li

Self-evolving Autoencoder Embedded Q-Network

J. Senthilnath, Bangjian Zhou, Zhen Wei Ng, Deeksha Aggarwal, Rajdeep Dutta, Ji Wei Yoon, Aye Phyu Phyu Aung, Keyu Wu, Min Wu, Xiaoli Li

TL;DR

This work tackles exploration inefficiency in reinforcement learning with large or evolving state spaces by integrating a self-evolving autoencoder (SA) into a Q-Network (QN). The SA adapts its architecture through a bias-variance regulatory strategy that grows or prunes hidden neurons, producing disentangled latent representations that reduce Q-value estimation errors. Empirical results across CartPole, LunarLander, Minigrid, and molecular optimization demonstrate that SAQN outperforms conventional QN and integrated AE-QN approaches in convergence and rewards, at the cost of increased computation due to the evolving encoder. The approach offers a practical path to more efficient, adaptable RL in environments with high-dimensional or sparse observations, with potential extensions to continuous actions via variational or generative latent models and policy-based methods.

Abstract

In the realm of sequential decision-making tasks, the exploration capability of a reinforcement learning (RL) agent is paramount for achieving high rewards through interactions with the environment. To enhance this crucial ability, we propose SAQN, a novel approach wherein a self-evolving autoencoder (SA) is embedded with a Q-Network (QN). In SAQN, the self-evolving autoencoder architecture adapts and evolves as the agent explores the environment. This evolution enables the autoencoder to capture a diverse range of raw observations and represent them effectively in its latent space. By leveraging the disentangled states extracted from the encoder generated latent space, the QN is trained to determine optimal actions that improve rewards. During the evolution of the autoencoder architecture, a bias-variance regulatory strategy is employed to elicit the optimal response from the RL agent. This strategy involves two key components: (i) fostering the growth of nodes to retain previously acquired knowledge, ensuring a rich representation of the environment, and (ii) pruning the least contributing nodes to maintain a more manageable and tractable latent space. Extensive experimental evaluations conducted on three distinct benchmark environments and a real-world molecular environment demonstrate that the proposed SAQN significantly outperforms state-of-the-art counterparts. The results highlight the effectiveness of the self-evolving autoencoder and its collaboration with the Q-Network in tackling sequential decision-making tasks.

Self-evolving Autoencoder Embedded Q-Network

TL;DR

Abstract

Paper Structure (24 sections, 20 equations, 9 figures, 3 tables, 2 algorithms)

This paper contains 24 sections, 20 equations, 9 figures, 3 tables, 2 algorithms.

Introduction
Related Works
Generative RL
AE Architecture
Preliminaries
Problem Definition
Q-Network
Estimation error in Q-values
Generating latent states
Our Proposed SAQN Methodology
Self-evolving Autoencoder
Analysis of bias-variance in Self-evolving Autoencoder
Experiments
Experimental settings
Evaluation of benchmark environment
...and 9 more sections

Figures (9)

Figure 1: Schematic diagram of the SAQN [+ indicates adding neuron; - indicates pruning neuron]
Figure 2: Average reward vs. episodes by different RL agents in the Cartpole environment.
Figure 3: RL agent average reward vs. episodes in the Lunarlander-v2.
Figure 4: Average reward vs. episodes by different RL agents in the Minigrid environment.
Figure 5: TSNE plots of raw observation states for [a] Cartpole-v0 [c] LunarLander-v2; latent representation of pre-trained SAQN encoder layer for [b] Cartpole-V0; [d] LunarLander-v2.
...and 4 more figures

Self-evolving Autoencoder Embedded Q-Network

TL;DR

Abstract

Self-evolving Autoencoder Embedded Q-Network

Authors

TL;DR

Abstract

Table of Contents

Figures (9)