Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers

Aleksandar Krnjaic; Raul D. Steleac; Jonathan D. Thomas; Georgios Papoudakis; Lukas Schäfer; Andrew Wing Keung To; Kuan-Ho Lao; Murat Cubuktepe; Matthew Haley; Peter Börsting; Stefano V. Albrecht

Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers

Aleksandar Krnjaic, Raul D. Steleac, Jonathan D. Thomas, Georgios Papoudakis, Lukas Schäfer, Andrew Wing Keung To, Kuan-Ho Lao, Murat Cubuktepe, Matthew Haley, Peter Börsting, Stefano V. Albrecht

TL;DR

H hierarchical MARL algorithms are developed in which a manager agent assigns goals to worker agents, and the policies of the manager and workers are co-trained toward maximising a global objective (e.g. pick rate).

Abstract

We consider a warehouse in which dozens of mobile robots and human pickers work together to collect and deliver items within the warehouse. The fundamental problem we tackle, called the order-picking problem, is how these worker agents must coordinate their movement and actions in the warehouse to maximise performance in this task. Established industry methods using heuristic approaches require large engineering efforts to optimise for innately variable warehouse configurations. In contrast, multi-agent reinforcement learning (MARL) can be flexibly applied to diverse warehouse configurations (e.g. size, layout, number/types of workers, item replenishment frequency), and different types of order-picking paradigms (e.g. Goods-to-Person and Person-to-Goods), as the agents can learn how to cooperate optimally through experience. We develop hierarchical MARL algorithms in which a manager agent assigns goals to worker agents, and the policies of the manager and workers are co-trained toward maximising a global objective (e.g. pick rate). Our hierarchical algorithms achieve significant gains in sample efficiency over baseline MARL algorithms and overall pick rates over multiple established industry heuristics in a diverse set of warehouse configurations and different order-picking paradigms.

Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers

TL;DR

Abstract

Paper Structure (44 sections, 5 equations, 4 figures, 1 table)

This paper contains 44 sections, 5 equations, 4 figures, 1 table.

INTRODUCTION
Problem Overview
Person-to-Goods (PTG)
Goods-to-Person (GTP)
Motivation
Contribution
RELATED LITERATURE
AGV-Assisted Order-Picking
Multi-Agent Path Finding
Multi-Agent Pickup and Delivery Problem
Multi-Agent Reinforcement Learning
PRELIMINARIES
Warehouse Definition
Objective
PROPOSED APPROACH
...and 29 more sections

Figures (4)

Figure 1: Left: Dematic PTG simulator with human pickers and AGVs. Right: TA-RWARE GTP simulator with picking bots (diamond) and AGVs (hexagon).
Figure 2: Proposed 3-layer manager/worker agent hierarchy. A manager agent observes information about the warehouse state and orders, and assigns a task (target zone in warehouse) to each worker agent. Worker agents receive local observations about the warehouse and the assigned task from the manager, and select an item location from the assigned target zone. A low-level controller then navigates the worker to the selected item location.
Figure 3: Average pick rate (order-lines per hour) in Dematic PTG simulator for heuristics FM/PDM and MARL algorithms IAC, SNAC, SEAC, HIAC (ours), HSNAC (ours), HSEAC (ours). Shaded area shows 95% stratified bootstrap confidence interval agarwal2021deep, with 300 episode average smoothing.
Figure 4: Average pick rate (order-lines per hour) in TA-RWARE GTP simulator for heuristic CTA and MARL algorithms IAC, SNAC, SEAC, HIAC (ours), HSNAC (ours), HSEAC (ours). Shaded area shows 95% stratified bootstrap confidence interval, with 300 episode average smoothing.

Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers

TL;DR

Abstract

Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers

Authors

TL;DR

Abstract

Table of Contents

Figures (4)