Table of Contents
Fetching ...

Python-Based Reinforcement Learning on Simulink Models

Georg Schäfer, Max Schirl, Jakob Rehrl, Stefan Huber, Simon Hirlaender

TL;DR

This work addresses training reinforcement learning agents for mechatronic systems by bridging Simulink simulations with Python-based RL libraries. It proposes an end-to-end integration workflow that generates C code from Simulink, compiles to a DLL, embeds it in Python via ctypes, and wraps the model in a Gymnasium environment to enable training with Stable Baselines3 on the Quanser Aero 2. The results demonstrate that policies trained on the Simulink model can transfer to the real system and that the Python-based approach outperforms a MATLAB-based baseline, reducing development time. The work enables faster, flexible RL development for complex, nonlinear MIMO systems and paves the way for broader adoption of Python tooling in Simulink-centered control research.

Abstract

This paper proposes a framework for training Reinforcement Learning agents using Python in conjunction with Simulink models. Leveraging Python's superior customization options and popular libraries like Stable Baselines3, we aim to bridge the gap between the established Simulink environment and the flexibility of Python for training bleeding edge agents. Our approach is demonstrated on the Quanser Aero 2, a versatile dual-rotor helicopter. We show that policies trained on Simulink models can be seamlessly transferred to the real system, enabling efficient development and deployment of Reinforcement Learning agents for control tasks. Through systematic integration steps, including C-code generation from Simulink, DLL compilation, and Python interface development, we establish a robust framework for training agents on Simulink models. Experimental results demonstrate the effectiveness of our approach, surpassing previous efforts and highlighting the potential of combining Simulink with Python for Reinforcement Learning research and applications.

Python-Based Reinforcement Learning on Simulink Models

TL;DR

This work addresses training reinforcement learning agents for mechatronic systems by bridging Simulink simulations with Python-based RL libraries. It proposes an end-to-end integration workflow that generates C code from Simulink, compiles to a DLL, embeds it in Python via ctypes, and wraps the model in a Gymnasium environment to enable training with Stable Baselines3 on the Quanser Aero 2. The results demonstrate that policies trained on the Simulink model can transfer to the real system and that the Python-based approach outperforms a MATLAB-based baseline, reducing development time. The work enables faster, flexible RL development for complex, nonlinear MIMO systems and paves the way for broader adoption of Python tooling in Simulink-centered control research.

Abstract

This paper proposes a framework for training Reinforcement Learning agents using Python in conjunction with Simulink models. Leveraging Python's superior customization options and popular libraries like Stable Baselines3, we aim to bridge the gap between the established Simulink environment and the flexibility of Python for training bleeding edge agents. Our approach is demonstrated on the Quanser Aero 2, a versatile dual-rotor helicopter. We show that policies trained on Simulink models can be seamlessly transferred to the real system, enabling efficient development and deployment of Reinforcement Learning agents for control tasks. Through systematic integration steps, including C-code generation from Simulink, DLL compilation, and Python interface development, we establish a robust framework for training agents on Simulink models. Experimental results demonstrate the effectiveness of our approach, surpassing previous efforts and highlighting the potential of combining Simulink with Python for Reinforcement Learning research and applications.
Paper Structure (9 sections, 3 figures)

This paper contains 9 sections, 3 figures.

Figures (3)

  • Figure 1: The Quanser Aero 2 (left) and its schematic representation (right) in a 1- configuration.
  • Figure 2: Mean episode return (in dark blue) with minimum and maximum (light blue area) of five training runs of the agent performed in the simulation of the Quanser Aero 2 system using MATLAB's Simulink in conjunction with Gymnasium.
  • Figure 3: Evaluation runs on the simulation and real system illustrating the behavior of the actual tilt $\varTheta_{\text{sim}}$ and $\varTheta_{\text{real}}$ for simulation and real system, respectively, using a greedy policy and a dynamic target tilt $r$.