LExCI: A Framework for Reinforcement Learning with Embedded Systems

Kevin Badalian; Lucas Koch; Tobias Brinkmann; Mario Picerno; Marius Wegener; Sung-Yong Lee; Jakob Andert

LExCI: A Framework for Reinforcement Learning with Embedded Systems

Kevin Badalian, Lucas Koch, Tobias Brinkmann, Mario Picerno, Marius Wegener, Sung-Yong Lee, Jakob Andert

TL;DR

This paper presents a framework named LExCI, the Learning and Experiencing Cycle Interface, which provides end-users with a free and open-source tool for training agents on embedded systems using theopen-source library RLlib.

Abstract

Advances in artificial intelligence (AI) have led to its application in many areas of everyday life. In the context of control engineering, reinforcement learning (RL) represents a particularly promising approach as it is centred around the idea of allowing an agent to freely interact with its environment to find an optimal strategy. One of the challenges professionals face when training and deploying RL agents is that the latter often have to run on dedicated embedded devices. This could be to integrate them into an existing toolchain or to satisfy certain performance criteria like real-time constraints. Conventional RL libraries, however, cannot be easily utilised in conjunction with that kind of hardware. In this paper, we present a framework named LExCI, the Learning and Experiencing Cycle Interface, which bridges this gap and provides end-users with a free and open-source tool for training agents on embedded systems using the open-source library RLlib. Its operability is demonstrated with two state-of-the-art RL-algorithms and a rapid control prototyping system.

LExCI: A Framework for Reinforcement Learning with Embedded Systems

TL;DR

Abstract

Paper Structure (19 sections, 14 equations, 16 figures, 3 tables)

This paper contains 19 sections, 14 equations, 16 figures, 3 tables.

Introduction
RL, Control Tasks, and Embedded Systems
Model Execution
Training the Model
Proposed Solution
Theoretical Background
Reinforcement Learning
Algorithms
Proximal Policy Optimization
Deep Deterministic Policy Gradient
Software
Architecture and General Workflow
Setup
Experiments
Pendulum Environment and Setup
...and 4 more sections

Figures (16)

Figure 1: Software architecture of the LExCI framework with the eponymous cycle as a light green arrow. There are multiple independent instances of the data generation domain when the process is parallelised.
Figure 2: Simplified flowchart of the LExCI framework. The grey, dashed arrows indicate communication/data exchange between the Minion and the Master. The blue areas tagged with Roman numerals serve as references for the textual description of the figure.
Figure 3: LExCI's RL Block in Simulink. The ports and are in the denormalised space of the environment.
Figure 4: The pendulum swing-up problem according to gymnasium-pendulum.
Figure 5: Average LExCI PPO training returns with three episodes per cycle. The data has been smoothed with a moving average filter of size 11.
...and 11 more figures

LExCI: A Framework for Reinforcement Learning with Embedded Systems

TL;DR

Abstract

LExCI: A Framework for Reinforcement Learning with Embedded Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (16)