Hard-Thresholding Meets Evolution Strategies in Reinforcement Learning
Chengqian Gao, William de Vazelhes, Hualin Zhang, Bin Gu, Zhiqiang Xu
TL;DR
The paper tackles the vulnerability of Evolution Strategies to task-irrelevant features by introducing NESHT, which combines Natural Evolution Strategies with a Hard-Thresholding operator to enforce $L_0$ sparsity in policy parameters. It provides a rigorous convergence analysis under boundedness and smoothness assumptions, deriving gradient-estimator error bounds and convergence rates that account for the sparsity constraint. Empirically, NESHT demonstrates robustness to noisy Mujoco environments with sparse rewards and improved performance on pixel-based Atari tasks, outperforming vanilla NES and several RL baselines in many settings. The work offers a principled, scalable approach to feature selection within ES-based reinforcement learning, with practical implications for real-world applications where reward signals are imperfect and observations include substantial irrelevancies.
Abstract
Evolution Strategies (ES) have emerged as a competitive alternative for model-free reinforcement learning, showcasing exemplary performance in tasks like Mujoco and Atari. Notably, they shine in scenarios with imperfect reward functions, making them invaluable for real-world applications where dense reward signals may be elusive. Yet, an inherent assumption in ES, that all input features are task-relevant, poses challenges, especially when confronted with irrelevant features common in real-world problems. This work scrutinizes this limitation, particularly focusing on the Natural Evolution Strategies (NES) variant. We propose NESHT, a novel approach that integrates Hard-Thresholding (HT) with NES to champion sparsity, ensuring only pertinent features are employed. Backed by rigorous analysis and empirical tests, NESHT demonstrates its promise in mitigating the pitfalls of irrelevant features and shines in complex decision-making problems like noisy Mujoco and Atari tasks.
