Table of Contents
Fetching ...

Evolutionary Warm-Starts for Reinforcement Learning in Industrial Continuous Control

Tom Maus, Stephan Frank, Tobias Glasmachers

Abstract

Reinforcement learning (RL) is still rarely applied in industrial control, partly due to the difficulty of training reliable agents for real-world conditions. This work investigates how evolution strategies can support RL in such settings by introducing a continuous-control adaptation of an industrial sorting benchmark. The CMA-ES algorithm is used to generate high-quality demonstrations that warm-start RL agents. Results show that CMA-ES-guided initialization significantly improves stability and performance. Furthermore, the demonstration trajectories generated with the CMA-ES provide a strong oracle reference performance level, which is of interest in its own right. The study delivers a focused proof of concept for hybrid evolutionary-RL approaches and a basis for future, more complex industrial applications.

Evolutionary Warm-Starts for Reinforcement Learning in Industrial Continuous Control

Abstract

Reinforcement learning (RL) is still rarely applied in industrial control, partly due to the difficulty of training reliable agents for real-world conditions. This work investigates how evolution strategies can support RL in such settings by introducing a continuous-control adaptation of an industrial sorting benchmark. The CMA-ES algorithm is used to generate high-quality demonstrations that warm-start RL agents. Results show that CMA-ES-guided initialization significantly improves stability and performance. Furthermore, the demonstration trajectories generated with the CMA-ES provide a strong oracle reference performance level, which is of interest in its own right. The study delivers a focused proof of concept for hybrid evolutionary-RL approaches and a basis for future, more complex industrial applications.

Paper Structure

This paper contains 5 sections, 2 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Simplified schematic of the continuous sorting process, based on maus_sortingenv_2025maus_leveraging_2025. The agent regulates the input stream into the sorting machine for separation of materials. Output quantities and purities define the performance targets.
  • Figure 2: Performance comparison across 20 test seeds with mean cumulative reward, standard deviation (top) and reward distributions (bottom).