Learning the Approach During the Short-loading Cycle Using Reinforcement Learning
Carl Borngrund, Ulf Bodin, Henrik Andreasson, Fredrik Sandin
TL;DR
The study addresses automating the short-loading cycle in wheel-loader/dump-truck workflows using reinforcement learning. A PPO-based actor-critic agent is trained in a PyBullet-based simulation to approach a dump truck and lift the bucket, with a task-specific reward driving progress toward the target while maintaining lift. The agent is then deployed on a real Volvo L180H without additional training, demonstrating qualitative transfer despite a ~3 s sensor delay and simplified real-vehicle dynamics. The findings show potential for simulation-to-reality automation in this domain, while highlighting the need for expanded control inputs and higher-fidelity sensing for robust, multi-signal automation in varied environments.
Abstract
The short-loading cycle is a repetitive task performed in high quantities, making it a great alternative for automation. In the short-loading cycle, an expert operator navigates towards a pile, fills the bucket with material, navigates to a dump truck, and dumps the material into the tipping body. The operator has to balance the productivity goal while minimising the fuel usage, to maximise the overall efficiency of the cycle. In addition, difficult interactions, such as the tyre-to-surface interaction further complicate the cycle. These types of hard-to-model interactions that can be difficult to address with rule-based systems, together with the efficiency requirements, motivate us to examine the potential of data-driven approaches. In this paper, the possibility of teaching an agent through reinforcement learning to approach a dump truck's tipping body and get in position to dump material in the tipping body is examined. The agent is trained in a 3D simulated environment to perform a simplified navigation task. The trained agent is directly transferred to a real vehicle, to perform the same task, with no additional training. The results indicate that the agent can successfully learn to navigate towards the dump truck with a limited amount of control signals in simulation and when transferred to a real vehicle, exhibits the correct behaviour.
