Reinforcement Learning to improve delta robot throws for sorting scrap metal

Arthur Louette; Gaspard Lambrechts; Damien Ernst; Eric Pirard; Godefroid Dislaire

Reinforcement Learning to improve delta robot throws for sorting scrap metal

Arthur Louette, Gaspard Lambrechts, Damien Ernst, Eric Pirard, Godefroid Dislaire

TL;DR

This work shows the benefits of RL-based PaT compared to PaP or classical optimization PaT techniques used in the industry as well as evaluating the performances of RL algorithms.

Abstract

This study proposes a novel approach based on reinforcement learning (RL) to enhance the sorting efficiency of scrap metal using delta robots and a Pick-and-Place (PaP) process, widely used in the industry. We use three classical model-free RL algorithms (TD3, SAC and PPO) to reduce the time to sort metal scraps. We learn the release position and speed needed to throw an object in a bin instead of moving to the exact bin location, as with the classical PaP technique. Our contribution is threefold. First, we provide a new simulation environment for learning RL-based Pick-and-Throw (PaT) strategies for parallel grippers. Second, we use RL algorithms for learning this task in this environment resulting in 89% accuracy while speeding up the throughput by 51% in simulation. Third, we evaluate the performances of RL algorithms and compare them to a PaP and a state-of-the-art PaT method both in simulation and reality, learning only from simulation with domain randomisation and without fine tuning in reality to transfer our policies. This work shows the benefits of RL-based PaT compared to PaP or classical optimization PaT techniques used in the industry.

Reinforcement Learning to improve delta robot throws for sorting scrap metal

TL;DR

This work shows the benefits of RL-based PaT compared to PaP or classical optimization PaT techniques used in the industry as well as evaluating the performances of RL algorithms.

Abstract

Paper Structure (16 sections, 2 equations, 4 figures, 7 tables)

This paper contains 16 sections, 2 equations, 4 figures, 7 tables.

Introduction
Related work
Method
Background
Problem statement
Training
Reward baseline.
Hyperparameters optimization
Domain randomization
Model training
Experiments
Simulation experiment
Real-world experiment
Conclusion
Optuna Study
...and 1 more sections

Figures (4)

Figure 1: Photo of the PICKIT project at the GeMMe (Georesources, Mineral Engineering & Extractive Metallurgy) lab with a focus on the ABB IRB 360 Flexpicker robots sorting scrap metal.
Figure 2: Schema of the PICKIT project with the different sensors (3D, Infrared, X-ray and LIBS) followed by the delta robots sorting the materials.
Figure 3: On the left, the speed profile of a throw; On the right, the schema of a throw in the environment.
Figure 4: Evolution of the reward for each algorithm (SAC, PPO and TD3) and hyperparameter set (SB3 and Optuna parameters) during the training over 500,000 episodes. For each algorithm/parameter pair we represent the mean between three runs. The intervals represent the standard deviation between the runs.

Reinforcement Learning to improve delta robot throws for sorting scrap metal

TL;DR

Abstract

Reinforcement Learning to improve delta robot throws for sorting scrap metal

Authors

TL;DR

Abstract

Table of Contents

Figures (4)