Table of Contents
Fetching ...

The Crucial Role of Problem Formulation in Real-World Reinforcement Learning

Georg Schäfer, Tatjana Krau, Jakob Rehrl, Stefan Huber, Simon Hirlaender

TL;DR

The paper argues that problem formulation is a critical, often overlooked driver of real-world RL performance in industrial cyber-physical systems. It proposes a structured set of design principles and validates them on a 1-DoF helicopter (Quanser Aero 2) through both simulation and hardware experiments, showing substantial gains in stability, sample efficiency, and final policy quality. Key contributions include normalization of observations and actions, randomization of targets and initial states, longer episode horizons, and reward shaping with an action penalty, all of which enable training without a priori models. The results demonstrate the feasibility of deploying RL directly on physical ICPS hardware and highlight the potential for a robust RL engineering workflow that bridges RL research and real-world industrial control needs.

Abstract

Reinforcement Learning (RL) offers promising solutions for control tasks in industrial cyber-physical systems (ICPSs), yet its real-world adoption remains limited. This paper demonstrates how seemingly small but well-designed modifications to the RL problem formulation can substantially improve performance, stability, and sample efficiency. We identify and investigate key elements of RL problem formulation and show that these enhance both learning speed and final policy quality. Our experiments use a one-degree-of-freedom (1-DoF) helicopter testbed, the Quanser Aero~2, which features non-linear dynamics representative of many industrial settings. In simulation, the proposed problem design principles yield more reliable and efficient training, and we further validate these results by training the agent directly on physical hardware. The encouraging real-world outcomes highlight the potential of RL for ICPS, especially when careful attention is paid to the design principles of problem formulation. Overall, our study underscores the crucial role of thoughtful problem formulation in bridging the gap between RL research and the demands of real-world industrial systems.

The Crucial Role of Problem Formulation in Real-World Reinforcement Learning

TL;DR

The paper argues that problem formulation is a critical, often overlooked driver of real-world RL performance in industrial cyber-physical systems. It proposes a structured set of design principles and validates them on a 1-DoF helicopter (Quanser Aero 2) through both simulation and hardware experiments, showing substantial gains in stability, sample efficiency, and final policy quality. Key contributions include normalization of observations and actions, randomization of targets and initial states, longer episode horizons, and reward shaping with an action penalty, all of which enable training without a priori models. The results demonstrate the feasibility of deploying RL directly on physical ICPS hardware and highlight the potential for a robust RL engineering workflow that bridges RL research and real-world industrial control needs.

Abstract

Reinforcement Learning (RL) offers promising solutions for control tasks in industrial cyber-physical systems (ICPSs), yet its real-world adoption remains limited. This paper demonstrates how seemingly small but well-designed modifications to the RL problem formulation can substantially improve performance, stability, and sample efficiency. We identify and investigate key elements of RL problem formulation and show that these enhance both learning speed and final policy quality. Our experiments use a one-degree-of-freedom (1-DoF) helicopter testbed, the Quanser Aero~2, which features non-linear dynamics representative of many industrial settings. In simulation, the proposed problem design principles yield more reliable and efficient training, and we further validate these results by training the agent directly on physical hardware. The encouraging real-world outcomes highlight the potential of RL for ICPS, especially when careful attention is paid to the design principles of problem formulation. Overall, our study underscores the crucial role of thoughtful problem formulation in bridging the gap between RL research and the demands of real-world industrial systems.

Paper Structure

This paper contains 28 sections, 2 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Average deviation to the target for the "Baseline" and "New setting" configurations on the evaluation profile during training. The "Baseline" and "New setting" configurations were trained on the simulation model for 1 million steps. Additionally, the plot includes the results for the "New setting with action penalty" configuration trained on the real system for 250000.0 steps.
  • Figure 2: Evaluation on the real system with an agent trained for 250000.0 steps, showing the target pitch, actual pitch, and applied voltage.